Persistent Kernels for Iterative Memory-bound GPU Applications

hgpu.org » Applications » Computer science » Persistent Kernels for Iterative Memory-bound GPU Applications

Persistent Kernels for Iterative Memory-bound GPU Applications

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Satoshi Matsuoka

Tokyo Institute of Technology

arXiv:2204.02064 [cs.DC]

DOI:10.48550/arXiv.2204.02064

BibTeX

Download (PDF)

View

Source

Source codes

Package:

PERKS: Persistent Kernels for Iterative Memory-bound GPU Applications

1315

views

Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts as the barrier required after advancing the solution every time step. We propose a scheme for running memory-bound iterative GPU kernels: PERsistent KernelS (PERKS). In this scheme the time loop is moved inside a persistent kernel, and device-wide barriers are used for synchronization. We then reduce the traffic to device memory by caching a subset of the output in each time step in registers and shared memory to be used as input for the following time step. PERKS can be generalized to any iterative solver: they are largely independent of the solver’s implementation. We explain the design principle of PERKS and demonstrate the effectiveness of PERKS for a wide range of iterative 2D/3D stencil benchmarks (geometric mean speedup of 2.29x in small domains and 1.53x in large domains), and a Krylov subspace solver (geometric mean speedup of 4.67x in smaller SpMV datasets from SuiteSparse and 1.39x in larger SpMV datasets, for conjugate gradient).

Tags: Computer science, CUDA, nVidia, Package, Stencil computation, Tesla A100, Tesla V100

April 10, 2022 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org