Lattice QCD on Intel Xeon Phi
Thomas Jefferson National Accelerator Facility, Newport News, VA, U.S.A
International Supercomputing Conference (ISC’13), 2013
@article{joo2013lattice,
title={Lattice QCD on Intel Xeon Phi},
author={Jo{‘o}, B{‘a}lint and Kalamkar, Dhiraj D and Vaidyanathan, Karthikeyan and Smelyanskiy, Mikhail and Pamnany, Kiran and Lee, Victor W and Dubey, Pradeep and III, William Watson},
year={2013}
}
The Intel Xeon Phi architecture from Intel Corporation features parallelism at the level of many x86-based cores, multiple threads per core, and vector processing units. Lattice Quantum Chromodynamics (LQCD) is currently the only known model independent, non perturbative computational method for calculations in theory of the strong interactions, and is of importance in studies of nuclear and high energy physics. In this contribution, we describe our experiences with optimizing a key LQCD kernel for the Xeon Phi architecture. On a single node, our Dslash kernel sustains a performance of around 280 GFLOPS, while our full solver sustains around 215 GFLOPS. Furthermore we demonstrate a fully "native" multi-node LQCD implementation running entirely on KNC nodes with minimum involvement of the host CPU. Our multi-node implementation of the solver has been strong scaled to 3.6 TFLOPS on 64 KNCs.
July 20, 2013 by hgpu