high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

M. A. Clark, Balint Joo, Alexei Strelchenko, Michael Cheng, Arjun Gambhir, Richard Brower

NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, CA 91214, USA

arXiv:1612.07873 [hep-lat], (23 Dec 2016)

@article{clark2016accelerating,

title={Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization},

author={Clark, M. A. and Joo, Balint and Strelchenko, Alexei and Cheng, Michael and Gambhir, Arjun and Brower, Richard},

year={2016},

month={dec},

archivePrefix={"arXiv"},

primaryClass={hep-lat}

}

Download (PDF)

View

Source

1713

views

The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate mapping of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings.

Tags: Algorithms, Computational Physics, CUDA, High Energy Physics – Lattice, nVidia, Physics, QCD, Tesla K20

December 26, 2016 by hgpu

Rating: 1.9/5. From 5 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

Share this:

Recent source codes

Most viewed papers (last 30 days)