high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » QUDA programming for staggered quarks

QUDA programming for staggered quarks

Steven Gottlieb, Guochun Shi, Aaron Torok, Volodymyr Kindratenko

National Center for Supercomputing Applications, University of Illinois and Indiana University, Bloomington, IN 47405, USA

In Proc. The XXVIII International Symposium on Lattice Field Theory – Lattice’10, 2010

@conference{gottlieb2010quda,

title={Quda programming for staggered quarks},

author={Gottlieb, S. and Shi, G. and Torok, A. and Kindratenko, V.},

booktitle={Proc. XXVIII International Symposium on Lattice Field Theory (Lattice 2010), Villasimius, Sardinia},

year={2010}

}

Download (PDF)

View

Source

Source codes

Package:

QUDA: A library for QCD on GPUs

2091

views

We have been extending the QUDA GPU code developed at Boston University to include the case of improved staggered quarks. Improved staggered quarks such as asqtad and HISQ require both first and third nearest neighbor terms in the Dirac operator. We call the corresponding links fatlinks and longlinks. The fatlinks are not unitary, and staggered phases are included in the links, so link reconstruction techniques may either be inapplicable or require modification. A single precision inverter using compressed storage for the longlinks achieves a speed of 100 GF/s on an NVIDIA GTX 280 GPU on a 24^3×32 lattice. In addition to the inverter code, we have code for fatlink computation, gauge force and fermion force. They run at 170, 186 and 107 GF/s, respectively, for similar conditions to the solver speed above. The single GPU code is currently in production on NCSA’s AC cluster for the study of electromagnetic effects. The double precision multimass solver is running at 20 GF/s, about 80% of the speed of an 8-node or 64-core job on Fermilab’s jpsi cluster. The AC cluster has C1060 Tesla boards with lower memory bandwidth than the GTX 280, where the DP inverter runs at 33 GF/s. Multi-GPU code is in development.

Tags: CUDA, High Energy Physics – Lattice, Monte Carlo simulation, nVidia, nVidia GeForce GTX 280, nVidia GeForce GTX 285, nVidia GeForce GTX 480, Package, Physics, QCD, Tesla C1060, Tesla C2050, Tesla S1070

February 4, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

QUDA programming for staggered quarks

Package:

Your response

Recent source codes

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Most viewed papers (last 30 days)

QUDA programming for staggered quarks

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)