Cache Miss Analysis for GPU Programs Based on Stack Distance Profile
31st International Conference on Distributed Computing Systems (ICDCS), 2011
@inproceedings{tang2011cache,
title={Cache Miss Analysis for GPU Programs Based on Stack Distance Profile},
author={Tang, T. and Yang, X. and Lin, Y.},
booktitle={2011 31st International Conference on Distributed Computing Systems},
pages={623–634},
year={2011},
organization={IEEE}
}
Using the graphics processing unit (GPU) to accelerate the general purpose computation has attracted much attention from both the academia and industry due to GPU’s powerful computing capacity. Thus optimization of GPU programs has become a popular research direction. In order to support the general purpose computing more efficiently, GPU has integrated the general data cache to replace the existing software-managed on-chip memory. Consequently, improving the usage of the data cache becomes of vital importance to improve the performance of the GPU programs. The foundation of cache locality optimizations is efficient analysis and prediction of the cache behavior. Unfortunately, existing cache miss analysis models are based on sequential programs and thus cannot be used to analyze the GPU programs directly. In this paper, based on the deep analysis of GPU’s execution model, we propose, for the first time, a cache miss analysis model for the GPU programs. We divide the problem into two subproblems: stack distance profile analysis of single thread block and cache contention analysis of multiple thread blocks. The experimental results from nine typical application kernels in the scientific computing field illustrate that our method is efficient and can be used to guide the cache locality optimizations for the GPU programs.
August 9, 2011 by hgpu