high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

Tokyo Institute of Technology, AIST-Tokyo Tech Real World, Big-Data Computation Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology

arXiv:1907.06154 [cs.DC], (14 Jul 2019)

@misc{chen2019versatile,

title={A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels},

author={Peng Chen and Mohamed Wahib and Shinichiro Takizawa and Ryousei Takano and Satoshi Matsuoka},

year={2019},

eprint={1907.06154},

archivePrefix={arXiv},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

2632

views

This paper proposes a versatile high-performance execution model, inspired by systolic arrays, for memory-bound regular kernels running on CUDA-enabled GPUs. We formulate a systolic model that shifts partial sums by CUDA warp primitives for the computation. We also employ register files as a cache resource in order to operate the entire model efficiently. We demonstrate the effectiveness and versatility of the proposed model for a wide variety of stencil kernels that appear commonly in HPC, and also convolution kernels (increasingly important in deep learning workloads). Our algorithm outperforms the top reported state-of-the-art stencil implementations, including implementations with sophisticated temporal and spatial blocking techniques, on the two latest Nvidia architectures: Tesla V100 and P100. For 2D convolution of general filter sizes and shapes, our algorithm is on average 2.5x faster than Nvidia’s NPP on V100 and P100 GPUs.

Tags: Computer science, CUDA, nVidia, Package, Stencil computation, Tesla P100, Tesla V100

July 21, 2019 by hgpu

No votes yet.

Please wait...