The More We Share, The More We Have: Improving GPU performance through Register Sharing

Vishwesh Jatala, Jayvant Anantpur, Amey Karkare
Department of CSE, IIT Kanpur, Kanpur, India
arXiv:1503.05694 [cs.AR], (19 Mar 2015)


   title={The More We Share, The More We Have: Improving GPU performance through Register Sharing},

   author={Jatala, Vishwesh and Anantpur, Jayvant and Karkare, Amey},






Download Download (PDF)   View View   Source Source   



Graphics Processing Units (GPUs) consisting of Streaming Multiprocessors (SMs) achieve high throughput by running a large number of threads and context switching among them to hide execution latencies. The amount of thread level parallelism that can be utilized depends on the number of resident threads on each of the SMs. The threads are typically structured into a grid of thread blocks with each thread block containing a large number of threads. The number of thread blocks, and hence the number of threads that can be launched on an SM, depends on the resource usage–e.g. number of registers, amount of shared memory–of the thread blocks. Since the allocation of threads to an SM is at the thread block granularity, some of the resources may not be used up completely and hence will be wasted. We propose an approach, Register Sharing, that utilizes the wasted registers in SMs to launch more thread blocks and hence increases the number of resident threads. We further propose three optimizations that make effective use of these extra thread blocks to hide long execution latencies and hence reduce the number of stall cycles. We experimentally validated our approach using GPGPU-Sim simulator on several applications from 3 different benchmark suites: GPGPU-Sim, Rodinia, and Parboil. We observed a maximum improvement of 24% and an average improvement of 11% with a very small hardware overhead.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: