high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Accelerating QDP++ using GPUs

Accelerating QDP++ using GPUs

Frank Winter

School of Physics and Astronomy, University of Edinburgh, Edinburgh EH9 3JZ, UK

arXiv:1105.2279v1 [hep-lat] (11 May 2011)

@article{2011arXiv1105.2279W,

author={Winter}, F.},

title={"{Accelerating QDP++ using GPUs}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1105.2279},

primaryClass={"hep-lat"},

keywords={High Energy Physics – Lattice, Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Programming Languages, Physics – Computational Physics},

year={2011},

month={may},

adsurl={http://adsabs.harvard.edu/abs/2011arXiv1105.2279W},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

1980

views

Graphic Processing Units (GPUs) are getting increasingly important as target architectures in scientific High Performance Computing (HPC). NVIDIA established CUDA as a parallel computing architecture controlling and making use of the compute power of GPUs. CUDA provides sufficient support for C++ language elements to enable the Expression Template (ET) technique in the device memory domain. QDP++ is a C++ vector class library suited for quantum field theory which provides vector data types and expressions and forms the basis of the lattice QCD software suite Chroma. In this work accelerating QDP++ expression evaluation to a GPU was successfully implemented leveraging the ET technique and using Just-In-Time (JIT) compilation. The Portable Expression Template Engine (PETE) and the C API for CUDA kernel arguments were used to build the bridge between host and device memory domains. This provides the possibility to accelerate Chroma routines to a GPU which are typically not subject to special optimisation. As an application example a smearing routine was accelerated to execute on a GPU. A significant speed-up compared to normal CPU execution could be measured.

Tags: Computational Physics, CUDA, High Energy Physics – Lattice, nVidia, nVidia GeForce GTX 480, Physics, Programming Languages, QCD

May 12, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Accelerating QDP++ using GPUs

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

Accelerating QDP++ using GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)