Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors
Department of Computer Science, Illinois Institute of Technology
2nd Greater Chicago Area System Research Workshop (GCASR), 2013
@article{johnson2013understanding,
title={Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors},
author={Johnson, Jeffrey and Krieder, Scott J and Grimmer, Benjamin and Wozniak, Justin M and Wilde, Michael and Raicu, Ioan},
year={2013}
}
Many-Task Computing (MTC) aims to bridge the gap between HPC and HTC. MTC emphasizes running many computational tasks over a short period of time, where tasks can be either dependent or independent of one another. MTC has been well supported on Clouds, Grids, and Supercomputers on traditional computing architectures, but the abundance of hybrid large-scale systems using accelerators has motivated us to explore the support of MTC on the new Intel Xeon Phi accelerators. The Xeon Phi is a PCI-Express based expansion card comprised of 60 cores supporting 240 hardware threads to produce up to 1 teraflop of double- precision performance in a single accelerator. These cards are already being integrated into super-computing clusters such as Stampede, which hosts over 6,400 Xeon Phi Accelerators totaling in over 7 petaflops of double- precision performance. This work provides an in depth understanding of MTC on the Intel Xeon Phi and presents our preliminary results of running several different workloads on pre-production Intel Xeon Phi hardware. By utilizing Intel’s provided SCIF protocol for communicating across the PCI-Express bus we have achieved over 90% efficiency near or outperforming OpenMP offloading tasks over 300 uS with our batch framework. This performance opens the opportunity for the development of a framework for executing heterogeneous tasks on the Xeon Phi alongside other potential accelerators including graphics cards for MTC applications. Our framework will provide fine granularity for executing MTC applications across large scale compute clusters. It will be integrated with our existing graphics card framework, GeMTC, to provide transparent access to GPUs, Xeon Phis, and future generations of accelerators to help bridge the gap into Exascale computing.
October 21, 2013 by hgpu