Locality-Aware Mapping of Nested Parallel Patterns on GPUs
Stanford University
47th International Symposium on Microarchitecture (MICRO’14), 2014
@article{lee2014locality,
title={Locality-Aware Mapping of Nested Parallel Patterns on GPUs},
author={Lee, HyoukJoong and Brown, Kevin J and Sujeeth, Arvind K and Rompf, Tiark and Olukotun, Kunle},
journal={TC},
volume={1},
pages={T2},
year={2014}
}
Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are nested, which is common in nontrivial applications. To address this issue, we present a general analysis framework for automatically and efficiently mapping nested patterns onto GPUs. The analysis maps nested patterns onto a logical multidimensional domain and parameterizes the block size and degree of parallelism in each dimension. We then add GPUspecific hard and soft constraints to prune the space of possible mappings and select the best mapping. We also perform multiple compiler optimizations that are guided by the mapping to avoid dynamic memory allocations and automatically utilize shared memory within GPU kernels. We compare the performance of our automatically selected mappings to hand-optimized implementations on multiple benchmarks and show that the average performance gap on 7 out of 8 benchmarks is 24%. Furthermore, our mapping strategy outperforms simple 1D mappings and existing 2D mappings by up to 28.6x and 9.6x respectively.
November 12, 2014 by hgpu