high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Physics » HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

O. Kaczmarek, C. Schmidt, P. Steinbrecher, Swagato Mukherjee, M. Wagner

Fakultat fur Physik, Universitat Bielefeld, D-33615 Bielefeld, Germany

arXiv:1409.1510 [cs.DC], (4 Sep 2014)

@article{2014arXiv1409.1510K,

author={Kaczmarek}, O. and {Schmidt}, C. and {Steinbrecher}, P. and {Mukherjee}, S. and {Wagner}, M.},

title={"{HISQ inverter on Intel Xeon Phi and NVIDIA GPUs}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1409.1510},

primaryClass={"cs.DC"},

keywords={Computer Science – Distributed, Parallel, and Cluster Computing, High Energy Physics – Lattice},

year={2014},

month={sep},

adsurl={http://adsabs.harvard.edu/abs/2014arXiv1409.1510K},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

1673

views

The runtime of a Lattice QCD simulation is dominated by a small kernel, which calculates the product of a vector by a sparse matrix known as the "Dslash" operator. Therefore, this kernel is frequently optimized for various HPC architectures. In this contribution we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through inverting multiple vectors at the same time we obtain a performance 250 GFlop/s on both architectures. This more than doubles the performance of the inversions. We give a short overview of both architectures, discuss some details of the implementation and the effort required to obtain the achieved performance.

Tags: Conjugate gradient solver, High Energy Physics – Lattice, Intel Xeon Phi, nVidia, nVidia GeForce GTX Titan, Physics, QCD, Sparse matrix, Tesla K20, Tesla K40

September 5, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)