## Quantum chemical many-body theory on heterogeneous nodes

Symposium on Application Accelerators in High-Performance Computing (SAAHPC), 2011

@inproceedings{deprince2011quantum,

title={Quantum chemical many-body theory on heterogeneous nodes},

author={DePrince III, A.E. and Hammond, J.R.},

booktitle={Application Accelerators in High-Performance Computing (SAAHPC), 2011 Symposium on},

pages={131–140},

year={2011},

organization={IEEE}

}

he iterative solution of the coupled-cluster with single and double excitations (CCSD) equations is a very time-consuming component of the "gold standard" in quantum chemistry, the CCSD(T) method. In an effort to accelerate accurate quantum mechanical calculations, we explore two implementation strategies for the iterative solution of the CC equations on graphics procesing units (GPUs). We consider a communication-avoiding algorithm for the spin-free coupled cluster doubles (CCD) equations followed by a low-storage algorithm for the spin-free CCSD equations. In the communication-avoiding algorithm, the entire iterative procedure for the CCD method is performed on the GPU, resulting in accelerations of a factor of 4-5 relative to the pure CPU algorithm. The low-storage CCSD algorithm requires that a minimum of $4o^2v^2+2ov$ elements be stored on the device, where $o$ and $v$ represent the number of orbitals occupied and unoccupied in the reference configuration, respectively. The algorithm masks the transfer time for copying large amounts of data to the GPU by overlapping GPU and CPU computations. The per-iteration costs of this hybrid GPU/CPU algorithm are up to 4.06 times less than those of the pure CPU algorithm and up to 10.63 times less than those of the CCSD implementation found in the {small Molpro} electronic structure package. These results provide insight into how to organize communication and computation as to maximize utilization of a GPU and multicore CPU at the same time.

November 1, 2011 by hgpu