Towards microsecond biological molecular dynamics simulations on hybrid processors
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
International Conference on High Performance Computing and Simulation (HPCS), 2010, (June 2010), pp. 98-107
@conference{hampton2010towards,
title={Towards microsecond biological molecular dynamics simulations on hybrid processors},
author={Hampton, S. and Agarwal, P.K. and Alam, S.R. and Crozier, P.S.},
booktitle={High Performance Computing and Simulation (HPCS), 2010 International Conference on},
pages={98–107},
year={2010},
organization={IEEE}
}
Biomolecular simulations continue to become an increasingly important component of molecular biochemistry and biophysics investigations. Performance improvements in the simulations based on molecular dynamics (MD) codes are widely desired. This is particularly driven by the rapid growth of biological data due to improvements in experimental techniques. Unfortunately, the factors, which allowed past performance improvements of MD simulations, particularly the increase in microprocessor clock frequencies, are no longer improving. Hence, novel software and hardware solutions are being explored for accelerating the performance of popular MD codes. In this paper, we describe our efforts to port and optimize LAMMPS, a popular MD framework, on hybrid processors: graphical processing units (GPUs) accelerated multi-core processors. Our implementation is based on porting the computationally expensive, non-bonded interaction terms on the GPUs, and overlapping the computation on the CPU and GPUs. This functionality is built on top of message passing interface (MPI) that allows multi-level parallelism to be extracted even at the workstation level with the multi-core CPUs as well as extend the implementation on GPU clusters. The results from a number of typically sized biomolecular systems are provided and analysis is performed on 3 generations of GPUs from NVIDIA. Our implementation allows up to 30-40 ns/day throughput on a single workstation as well as significant speedup over Cray XT5, a high-end supercomputing platform. Moreover, detailed analysis of the implementation indicates that further code optimization and improvements in GPUs will allow ~100 ns/day throughput on workstations and inexpensive GPU clusters, putting the widely-desired microsecond simulation time-scale within reach to a large user community.
January 19, 2011 by hgpu