Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs

hgpu.org » Programming » CUDA » Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs

Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs

Mario Schrock, Hannes Vogt

Institut fur Physik, FB Theoretische Physik, Universitat Graz, 8010 Graz, Austria

arXiv:1212.5221 [hep-lat] (20 Dec 2012)

@article{2012arXiv1212.5221S,

author={Schr{"o}ck}, M. and {Vogt}, H.},

title={"{Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1212.5221},

primaryClass={"hep-lat"},

keywords={High Energy Physics – Lattice},

year={2012},

month={dec},

adsurl={http://adsabs.harvard.edu/abs/2012arXiv1212.5221S},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

2201

views

A lattice gauge theory framework for simulations on graphic processing units (GPUs) using NVIDIA’s CUDA is presented. The code comprises template classes that take care of an optimal data pattern to ensure coalesced reading from device memory to achieve maximum performance. In this work we concentrate on applications for lattice gauge fixing in 3+1 dimensional SU(3) lattice gauge field theories. We employ the overrelaxation, stochastic relaxation and simulated annealing algorithms which are perfectly suited to be accelerated by highly parallel architectures like GPUs. The applications support the Coulomb, Landau and maximally Abelian gauges. Moreover, we explore the evolution of the numerical accuracy of the SU(3) valued degrees of freedom over the runtime of the algorithms in single (SP) and double precision (DP). Therefrom we draw conclusions on the reliability of SP and DP simulations and suggest a mixed precision scheme that performs the critical parts of the algorithm in full DP while retaining 80-90% of the SP performance. Finally, multi-GPUs are adopted to overcome the memory constraint of single GPUs. A communicator class which hides the MPI data exchange at the boundaries of the lattice domains, via the low bandwidth PCI-Bus, effectively behind calculations in the inner part of the domain is presented. Linear scaling using 16 NVIDIA Tesla C2070 devices and a maximum performance of 3.5 Teraflops on lattices of size down to 64^3 x 256 is demonstrated.

Tags: CUDA, GPU cluster, High Energy Physics – Lattice, Mixed precision, Monte Carlo simulation, MPI, nVidia, Physics, QCD, Tesla C2070

December 23, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org