8694

Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs

Mario Schrock, Hannes Vogt
Institut fur Physik, FB Theoretische Physik, Universitat Graz, 8010 Graz, Austria
arXiv:1212.5221 [hep-lat] (20 Dec 2012)

@article{2012arXiv1212.5221S,

   author={Schr{"o}ck}, M. and {Vogt}, H.},

   title={"{Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs}"},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1212.5221},

   primaryClass={"hep-lat"},

   keywords={High Energy Physics – Lattice},

   year={2012},

   month={dec},

   adsurl={http://adsabs.harvard.edu/abs/2012arXiv1212.5221S},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

895

views

A lattice gauge theory framework for simulations on graphic processing units (GPUs) using NVIDIA’s CUDA is presented. The code comprises template classes that take care of an optimal data pattern to ensure coalesced reading from device memory to achieve maximum performance. In this work we concentrate on applications for lattice gauge fixing in 3+1 dimensional SU(3) lattice gauge field theories. We employ the overrelaxation, stochastic relaxation and simulated annealing algorithms which are perfectly suited to be accelerated by highly parallel architectures like GPUs. The applications support the Coulomb, Landau and maximally Abelian gauges. Moreover, we explore the evolution of the numerical accuracy of the SU(3) valued degrees of freedom over the runtime of the algorithms in single (SP) and double precision (DP). Therefrom we draw conclusions on the reliability of SP and DP simulations and suggest a mixed precision scheme that performs the critical parts of the algorithm in full DP while retaining 80-90% of the SP performance. Finally, multi-GPUs are adopted to overcome the memory constraint of single GPUs. A communicator class which hides the MPI data exchange at the boundaries of the lattice domains, via the low bandwidth PCI-Bus, effectively behind calculations in the inner part of the domain is presented. Linear scaling using 16 NVIDIA Tesla C2070 devices and a maximum performance of 3.5 Teraflops on lattices of size down to 64^3 x 256 is demonstrated.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: