high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Lattice QCD on Intel Xeon Phi

Lattice QCD on Intel Xeon Phi

Balint Joo, Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Mikhail Smelyanskiy, Kiran Pamnany, Victor W Lee, Pradeep Dubey, William Watson III

Thomas Jefferson National Accelerator Facility, Newport News, VA, U.S.A

International Supercomputing Conference (ISC’13), 2013

@article{joo2013lattice,

title={Lattice QCD on Intel Xeon Phi},

author={Jo{‘o}, B{‘a}lint and Kalamkar, Dhiraj D and Vaidyanathan, Karthikeyan and Smelyanskiy, Mikhail and Pamnany, Kiran and Lee, Victor W and Dubey, Pradeep and III, William Watson},

year={2013}

}

Download (PDF)

View

Source

2447

views

The Intel Xeon Phi architecture from Intel Corporation features parallelism at the level of many x86-based cores, multiple threads per core, and vector processing units. Lattice Quantum Chromodynamics (LQCD) is currently the only known model independent, non perturbative computational method for calculations in theory of the strong interactions, and is of importance in studies of nuclear and high energy physics. In this contribution, we describe our experiences with optimizing a key LQCD kernel for the Xeon Phi architecture. On a single node, our Dslash kernel sustains a performance of around 280 GFLOPS, while our full solver sustains around 215 GFLOPS. Furthermore we demonstrate a fully "native" multi-node LQCD implementation running entirely on KNC nodes with minimum involvement of the host CPU. Our multi-node implementation of the solver has been strong scaled to 3.6 TFLOPS on 64 KNCs.

Tags: CUDA, High Energy Physics – Lattice, Intel, Intel Phi, nVidia, Physics, QCD, Tesla K20

July 20, 2013 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Lattice QCD on Intel Xeon Phi

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Lattice QCD on Intel Xeon Phi

Share this:

Recent source codes

Most viewed papers (last 30 days)