Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2010
@article{hyesoonmany,
title={Many-Thread Aware Prefetching Mechanisms for GPGPU Applications},
author={Hyesoon, J.L.N.B.L. and Vuduc, K.R.},
year={2010}
}
We consider the problem of how to improve memory latency tolerance in massively multithreaded GPGPUs when the thread-level parallelism of an application is not sufficient to hide memory latency. One solution used in conventional CPU systems is prefetching, both in hardware and software. However, we show that straightforwardly applying such mechanisms to GPGPU systems does not deliver the expected performance benefits and can in fact hurt performance when not used judiciously. This paper proposes new hardware and software prefetching mechanisms tailored to GPGPU systems, which we refer to as many-thread aware prefetching (MT-prefetching) mechanisms. Our software MT-prefetching mechanism, called inter-thread prefetching, exploits the existence of common memory access behavior among fine-grained threads. For hardware MT-prefetching, we describe a scalable prefetcher training algorithm along with a hardware-based inter-thread prefetching mechanism. In some cases, blindly applying prefetching degrades performance. To reduce such negative effects, we propose an adaptive prefetch throttling scheme, which permits automatic GPGPU application- and hardware-specific adjustment. We show that adaptation reduces the negative effects of prefetching and can even improve performance. Overall, compared to the state-of-the-art software and hardware prefetching, our MT-prefetching improves performance on average by 16%(software pref.)/15% (hardware pref.) on our benchmarks.
April 4, 2011 by hgpu