An Optimized Multiple Right-Hand Side Dslash Kernel for Intel Xeon Phi
Old Dominion University
Old Dominion University, 2016
@article{walden2016optimized,
title={An Optimized Multiple Right-Hand Side Dslash Kernel for Intel Xeon Phi},
author={Walden, Aaron},
year={2016}
}
Lattice quantum chromodynamics (LQCD) stands unique as the only computationally tractable, non-perturbative, and model-independent quantum field theory of the strong nuclear force. The computational core of LQCD is the Wilson Dslash operator, a nearest neighbor stencil operator summing matrix-vector multiplications over lattice points, whose performance is bandwidth-bound on most architectures. Reportedly, up to 90% of LQCD running time may be spent computing Dslash. In recent years, efforts have been made by researchers to optimize LQCD calculations for floating point coprocessor cards such as GPUs and Intel Xeon Phi Knights Corner (KNC), which boast powerful vector processing units. Most of these efforts in the area of Dslash have focused on single right-hand side solvers. This thesis will present two optimized Dslash kernels which simplify vectorization using multiple right-hand sides and traverse lattices using novel methods. The speedups resulting from these approaches will be explored in the context of KNC’s architecture.
July 28, 2016 by hgpu