18415

libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms

Daniel Hanlon, Hamidreza Khalighzadeh, Ravi Reddy Manumachu, Alexey Lastovetsky
School of Computer Science, University College Dublin, Dublin, Ireland
arXiv:1808.05056 [cs.DC], (15 Aug 2018)

@article{hanlon2018libhclooc,

   title={libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms},

   author={Hanlon, Daniel and Khalighzadeh, Hamidreza and Manumachu, Ravi Reddy and Lastovetsky, Alexey},

   year={2018},

   month={aug},

   archivePrefix={"arXiv"},

   primaryClass={cs.DC}

}

Hardware accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors (PHIs), and Field-Programmable Gate Arrays (FPGAs) are now ubiquitous in extreme-scale high performance computing (HPC), cloud, and Big data platforms to facilitate execution of workloads that demand high energy efficiency. They present unique interfaces and programming models therefore posing several limitations, which must be addressed to facilitate execution of large workloads. There is no library providing a unifying interface that allows programmers to write reusable out-of-core implementations of their data-parallel kernels that can run efficiently on different mainstream accelerators such as GPUs, PHIs, and FPGAs. We address this shortage in this paper. We present a library called libhclooc, which provides a unifying interface facilitating out-of-core implementations for data parallel kernels on the three different mainstream accelerators (GPUs, Intel Xeon Phis, FPGAs). We implement out-of-core matrix-matrix multiplication (MMOOC) using the libhclooc API and demonstrate its superior performance over vendor implementations. We show that it suffers from a maximum overhead of 10%, 4%, and 8% (due to abstraction) compared to the state-of-the-art optimised implementations for Nvidia K40c GPU, Nvidia P100 PCIe GPU, and Intel Xeon Phi 3120P respectively. We also show that using libhclooc API reduces the number of lines of code (LOC) by 75% thereby drastically improving programmer productivity.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: