GPU Acceleration of the Variational Monte Carlo Method for Many Body Physics
Anna University, Chennai
Anna University, 2013
@phdthesis{ragavan2013gpu,
title={GPU Acceleration of the Variational Monte Carlo Method for Many Body Physics},
author={Ragavan, Rajagopalan Kaushik},
year={2013},
school={Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering in The Department of Electrical and Computer Engineering by Rajagopalan Kaushik Ragavan B. ENG in Electronics and Communication, Anna University, Chennai}
}
High-Performance computing is one of the major areas making inroads into the future for large-scale simulation. Applications such as 3D nuclear test, Molecular Dynamics, and Quantum Monte Carlo simulations are now developed on supercomputers using the latest computing technologies. As per the TOP500 supercomputers rating, most of today’s supercomputers are now heterogeneous: with massively parallel Graphics Processing Units (GPU) equipped with Multi-core CPU(s) to increase the computational capacity. The Variational Monte Carlo(VMC) method is used in the Many Body Physics to study the ground state properties of a system. The wavefunction depends on some variational parameters, which contain the physics for a better prediction. In general, the variational parameters are chosen to realize some sort of order or broken symmetry such as superconductivity and magnetism. The variational approach is computationally expensive and requires a large number of Markov chains (MCs) to obtain convergence. The MCs exhibit abundant data parallelism and parallelizing across CPU clusters will prove to be expensive and does not scale in proportion to the system size. Hence, this method will be a suitable candidate on a massively parallel Graphics Processing Unit (GPU). In this research, we discuss about the various optimization and parallelization strategies adopted to port the VMC method to a NVIDIA GPU using CUDA. We obtained a speedup of nearly 3.85 X compared to the MPI implementation [4] and a speedup of upto 19 X compared to an object-oriented C++ code.
May 1, 2013 by hgpu