High Performance Stencil Code Algorithms for GPGPUs
author={Schäfer, Andreas and Fey, Dietmar},
title={"{High Performance Stencil Code Algorithms for GPGPUs}"},
journal={Procedia CS},
volume={4},
year={2011},
pages={2027-2036},
ee={http://dx.doi.org/10.1016/j.procs.2011.04.221},
bibsource={DBLP, http://dblp.uni-trier.de}
}
implemented on state-of-the-art general purpose graphics processing
units (GPGPUs). Stencil codes can be found at the core of many
numerical solvers and physical simulation codes and are therefore of
particular interest to scientific computing research. GPGPUs have
gained a lot of attention recently because of their superior
floating point performance and memory bandwidth. Nevertheless,
especially memory bound stencil codes have proven to be challenging
for GPGPUs, yielding lower than to be expected speedups.
We chose the Jacobi method as a standard benchmark to evaluate a set
of algorithms on NVIDIA’s latest Fermi chipset. One of our fastest
algorithms is a parallel wavefront update. It exploits the enlarged
on-chip shared memory to perform two time step updates per sweep. To
the best of our knowledge, it represents the first successful
application of temporal blocking for 3D stencils on GPGPUs and
thereby exceeds previous results by a considerable margin. It is also
the first paper to study stencil codes on Fermi.