10388

Tight Binding Molecular Dynamics on CPU and GPU clusters

Dimitar Pashov
Department of Physics, King’s College London, WC2R 2LS, UK
King’s College London, 2013
@article{pashov2013tight,

   title={Tight Binding Molecular Dynamics on CPU and GPU clusters},

   author={Pashov, Dimitar},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

360

views

The aim of this dCSE project was to improve the TBE code which is based on the tight binding model with self consistent multipole charge transfer. Given an appropriate parameterisation, the code is general and can be used to simulate a wide variety of systems and phenomena such as bond breaking, charge and magnetic polarisation. The first goal was to achieve better performance through parallelising all suitable routines with MPI. The next step was to integrate ScaLAPACK’s parallel diagonalisation routines transparently and with minimal communication, thus allowing the code to run on multi-node machines as opposed to a single node which was already possible thanks to threaded LAPACK/BLAS libraries. The third and last task was to utilise GPUs as accelerators for the heavy linear algebra calculations and subsequently integrate with the MPI parallelisation. The goals in the first two work packages were achieved mostly as planned with significant benefit gained from exploiting the sparsity of the tight binding Hamiltonian and reformulating the algorithms for calculations of the density matrix elements and related quantities. The electrostatics routines have also seen a significant reduction of memory usage and parallel speedup. A generic diagonalisation interface for all required diagonalisation routines was developed together with the related transparent communication routines. This is now available for all programs in the LMTO suite of which TBE is part. Nearly all of the code has been updated to Fortran 90 and later standards making it easier and much safer to work with. The third task, GPU porting of TBE, was the more exciting and riskier part of the project and it did not dissappoint in term of the challenges it provided. The original intention was to minimise risk by avoiding native development as much as possible by using established libraries instead. Unexpectedly the diagonalisation routines were nowhere near as fast as expected and this steered us in slightly uncharted territory, writing CUDA code for a number of matrix operations and researching completely different algorithms for obtaining the density matrix. Eventually, the goals were accomplished even though the acceleration is still far from what it was hoped to be.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Like us on Facebook

HGPU group

143 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1223 peoples are following HGPU @twitter

Featured events

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: