1117

Quantum Monte Carlo on graphical processing units

A. Anderson, W. Goddard, P. Schroder
Materials and Process Simulation Center, Division of Chemistry and Chemical Engineering, California Institute of Technology (MC 139-74), Pasadena, CA 91125, USA
Computer Physics Communications, Vol. 177, No. 3. (01 August 2007), pp. 298-306.

@article{anderson2007quantum,

   title={Quantum Monte Carlo on graphical processing units},

   author={Anderson, A.G. and Goddard III, W.A. and Schr{\”o}der, P.},

   journal={Computer Physics Communications},

   volume={177},

   number={3},

   pages={298–306},

   issn={0010-4655},

   year={2007},

   publisher={Elsevier}

}

Download Download (PDF)   View View   Source Source   

565

views

Quantum Monte Carlo (QMC) is among the most accurate methods for solving the time independent Schrodinger equation. Unfortunately, the method is very expensive and requires a vast array of computing resources in order to obtain results of a reasonable convergence level. On the other hand, the method is not only easily parallelizable across CPU clusters, but as we report here, it also has a high degree of data parallelism . This facilitates the use of recent technological advances in Graphical Processing Units (GPUs), a powerful type of processor well known to computer gamers. In this paper we report on an end-to-end QMC application with core elements of the algorithm running on a GPU. With individual kernels achieving as much as 30? speed up, the overall application performs at up to 6? faster relative to an optimized CPU implementation, yet requires only a modest increase in hardware cost. This demonstrates the speedup improvements possible for QMC in running on advanced hardware, thus exploring a path toward providing QMC level accuracy as a more standard tool. The major current challenge in running codes of this type on the GPU arises from the lack of fully compliant IEEE floating point implementations. To achieve better accuracy we propose the use of the Kahan summation formula in matrix multiplications. While this drops overall performance, we demonstrate that the proposed new algorithm can match CPU single precision.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: