Advancing Large Scale Many-Body QMC Simulations on GPU Accelerated Multicore Systems

Andres Tomas, Chia-Chen Chang, Richard Scalettar, Zhaojun Bai
Department of Computer Science, University of California, Davis, CA 95616, USA
26th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2012), 2012


   title={Advancing Large Scale Many-Body QMC Simulations on GPU Accelerated Multicore Systems},

   author={Tomas, A. and Chang, C.C. and Scalettar, R. and Bai, Z.},



Download Download (PDF)   View View   Source Source   



The Determinant Quantum Monte Carlo (DQMC) method is one of the most powerful approaches for understanding properties of an important class of materials with strongly interacting electrons, including magnets and superconductors. It treats these interactions exactly, but the solution of a system of N electrons must be extrapolated to bulk values. Currently N ~ 500 is state-of-the-art. Increasing N is required before DQMC can be used to model newly synthesized materials like functional multilayers. DQMC requires millions of linear algebra computations of order N matrices and scales as N^3. DQMC cannot exploit parallel distributed memory computers efficiently due to limited scalability with the small matrix sizes and stringent procedures for numerical stability. Today, the combination of multisocket multicore processors and GPUs provides widely available platforms with new opportunities for DQMC parallelization. The kernel of DQMC, the calculation of the Green’s function, involves long products of matrices. For numerical stability, these products must be computed using graded decompositions generated by the QR decomposition with column pivoting. The high communication overhead of pivoting limits parallel efficiency. In this paper, we propose a novel approach that exploits the progressive graded structure to reduce the communication costs of pivoting. We show that this method preserves the same numerical stability and achieves 70% performance of highly optimized DGEMM on a two-socket six-core Intel processor. We have integrated this new method and other parallelization techniques into QUEST, a modern DQMC simulation package. Using 36 hours on this Intel processor, we are able to compute accurately the magnetic properties and Fermi surface of a system of N = 1024 electrons. This simulation is almost an order of magnitude more difficult than N ~ 500, owing to the N^3 scaling. This increase in system size will allow, for the first time, the computation of the magnetic and transport properties of layered materials with DQMC. In addition, we show preliminary results which further accelerate DQMC simulations by using GPU processors.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: