high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Ronald Babich, Michael A. Clark, Balint Joo

Center for Computational Science, Boston University, Boston, Massachusetts 02215, USA

arXiv:1011.0024 [hep-lat] (29 Oct 2010)

@article{babich2010parallelizing,

title={Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics},

author={Babich, R. and Clark, M.A. and Jo{‘o}, B. and Shima, H. and Ghosh, S. and Arroyo, M. and Iiboshi, K. and Sato, M. and Voznyy, O. and G{\”u}{c{c}}l{\”u}, A.D. and others},

journal={Arxiv preprint arXiv:1011.0024},

year={2010}

}

Download (PDF)

View

Source

Source codes

Package:

QUDA: A library for QCD on GPUs

1919

views

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA’s Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the “9g” cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops.

Tags: CUDA, GPU cluster, High Energy Physics – Lattice, Monte Carlo simulation, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce GTX 285, nVidia GeForce GTX 480, Package, Physics, QCD, Tesla C1060, Tesla C2050, Tesla C870

November 9, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Package:

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)