The Promises of Hybrid Hexagonal/Classical Tiling for GPU
PARKAS (INRIA Paris-Rocquencourt), INRIA – Ecole normale superieure de Paris – ENS Paris – CNRS : UMR 8548
hal-00848691, (27 July 2013)
@techreport{grosser:hal-00848691,
hal_id={hal-00848691},
url={http://hal.inria.fr/hal-00848691},
title={The Promises of Hybrid Hexagonal/Classical Tiling for GPU},
author={Grosser, Tobias and Verdoolaege, Sven and Cohen, Albert and Sadayappan, P.},
language={Anglais},
affiliation={PARKAS – INRIA Paris-Rocquencourt , Department of Computer Science and Engineering – CSE},
type={Rapport de recherche},
institution={INRIA},
number={RR-8339},
year={2013},
month={Jul},
pdf={http://hal.inria.fr/hal-00848691/PDF/RR-8339.pdf}
}
Time-tiling is necessary for efficient execution of iterative stencil computations. But the usual hyper-rectangular tiles cannot be used because of positive/negative dependence distances along the stencil’s spatial dimensions. Several prior efforts have addressed this issue. However, known techniques trade enhanced data reuse for other causes of inefficiency, such as unbalanced parallelism, redundant computations, or increased control flow overhead incompatible with efficient GPU execution. We explore a new path to maximize the effectivness of time-tiling on iterative stencil computations. Our approach is particularly well suited for GPUs. It does not require any redundant computations, it favors coalesced global-memory access and data reuse in shared-memory/cache, avoids thread divergence, and extracts a high degree of parallelism. We introduce hybrid hexagonal tiling, combining hexagonal tile shapes along the time (sequential) dimension and one spatial dimension, with classical tiling for other spatial dimensions. An hexagonal tile shape simultaneously enable parallel tile execution and reuse along the time dimension. Experimental results demonstrate significant performance improvements over existing stencil compilers.
July 31, 2013 by hgpu