high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Tight Binding Molecular Dynamics on CPU and GPU clusters

Tight Binding Molecular Dynamics on CPU and GPU clusters

Dimitar Pashov

Department of Physics, King’s College London, WC2R 2LS, UK

King’s College London, 2013

BibTeX

Download (PDF)

View

Source

2474

views

The aim of this dCSE project was to improve the TBE code which is based on the tight binding model with self consistent multipole charge transfer. Given an appropriate parameterisation, the code is general and can be used to simulate a wide variety of systems and phenomena such as bond breaking, charge and magnetic polarisation. The first goal was to achieve better performance through parallelising all suitable routines with MPI. The next step was to integrate ScaLAPACK’s parallel diagonalisation routines transparently and with minimal communication, thus allowing the code to run on multi-node machines as opposed to a single node which was already possible thanks to threaded LAPACK/BLAS libraries. The third and last task was to utilise GPUs as accelerators for the heavy linear algebra calculations and subsequently integrate with the MPI parallelisation. The goals in the first two work packages were achieved mostly as planned with significant benefit gained from exploiting the sparsity of the tight binding Hamiltonian and reformulating the algorithms for calculations of the density matrix elements and related quantities. The electrostatics routines have also seen a significant reduction of memory usage and parallel speedup. A generic diagonalisation interface for all required diagonalisation routines was developed together with the related transparent communication routines. This is now available for all programs in the LMTO suite of which TBE is part. Nearly all of the code has been updated to Fortran 90 and later standards making it easier and much safer to work with. The third task, GPU porting of TBE, was the more exciting and riskier part of the project and it did not dissappoint in term of the challenges it provided. The original intention was to minimise risk by avoiding native development as much as possible by using established libraries instead. Unexpectedly the diagonalisation routines were nowhere near as fast as expected and this steered us in slightly uncharted territory, writing CUDA code for a number of matrix operations and researching completely different algorithms for obtaining the density matrix. Eventually, the goals were accomplished even though the acceleration is still far from what it was hoped to be.

Tags: Algorithms, CUDA, Electrostatics, Fortran, GPU cluster, Linear Algebra, Molecular dynamics, MPI, nVidia, Physics, Tesla K20

August 23, 2013 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Tight Binding Molecular Dynamics on CPU and GPU clusters

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Tight Binding Molecular Dynamics on CPU and GPU clusters

Share this:

Recent source codes

Most viewed papers (last 30 days)