Delivering Performance-Portable Stencil Computations on CPUs and GPUs Using Bricks

hgpu.org » Applications » Computer science » Delivering Performance-Portable Stencil Computations on CPUs and GPUs Using Bricks

Delivering Performance-Portable Stencil Computations on CPUs and GPUs Using Bricks

Tuowen Zhao, Samuel Williams, Mary Hall, Hans Johansen

School of Computing, University of Utah

International Workshop on Performance, Portability, and Productivity in HPC (P3HPC), 2018

BibTeX

Download (PDF)

View

Source

1883

views

Achieving high performance on stencil computations poses a number of challenges on modern architectures. The optimization strategy varies significantly across architectures, types of stencils, and types of applications. The standard approach to adapting stencil computations to different architectures, used by both compilers and application programmers, is through the use of iteration space tiling, whereby the data footprint of the computation and its computation partitioning are adjusted to match the memory hierarchy and available parallelism of different platforms. In this paper, we explore an alternative performance portability strategy for stencils, a data layout library for stencils called bricks, that adapts data footprint and parallelism through fine-grained data blocking. Bricks are designed to exploit the inherent multi-dimensional spatial locality of stencils, facilitating improved code generation that can adapt to CPUs or GPUs, and reducing pressure on the memory system. We demonstrate that bricks are performance-portable across CPU and GPU architectures and afford performance advantages over various tiling strategies, particularly for modern multi-stencil and highorder stencil computations. For a range of stencil computations, we achieve high performance on both the Intel Knights Landing (Xeon Phi) and Skylake (Xeon) CPUs as well as the NVIDIA P100 (Pascal) GPU delivering up to a 5x speedup against tiled code.

Tags: Computer science, CUDA, Intel Xeon Phi, nVidia, performance portability, Stencil computation, Tesla P100

December 16, 2018 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org