Kokkos: Enabling performance portability across manycore architectures
Sandia National Laboratories, PO Box 5800 / MS 1318, Albuquerque NM, 87185
XSCALE, 2013
@article{edwards2013kokkos,
title={Kokkos: Enabling performance portability across manycore architectures},
author={Edwards, H Carter and Trott, Christian R},
year={2013}
}
The manycore revolution in computational hardware can be characterized by increasing thread counts, decreasing memory per thread, and architecture specific performance constraints for memory access patterns. High performance computing (HPC) on emerging manycore architectures requires codes to exploit every opportunity for thread-level parallelism and satisfy conflicting performance constraints. We developed the Kokkos C++ library to provide scientific and engineering codes with a user accessible manycore performance portable programming model. The two foundational abstractions of Kokkos are (1) dispatch work to a manycore device for parallel execution and (2) manage multidimensional arrays with polymorphic layouts. The integration of these abstractions enables users’ code to satisfy multiple architecture specific memory access pattern performance constraints without having to modify their source code. In this paper we describe the Kokkos abstractions, summarize its application programmer interface (API), and present performance results for a molecular dynamics computational kernel and finite element mini-application.
September 17, 2013 by hgpu