5481

Energy-efficient mechanisms for managing thread context in throughput processors

Mark Gebhart, Daniel R. Johnson, David Tarjan, Stephen W. Keckler, William J. Dally, Erik Lindholm, Kevin Skadron
The University of Texas at Austin, Austin, TX, USA
Proceeding of the 38th annual international symposium on Computer architecture, ISCA ’11

@inproceedings{gebhart2011energy,

   title={Energy-efficient mechanisms for managing thread context in throughput processors},

   author={Gebhart, M. and Johnson, D.R. and Tarjan, D. and Keckler, S.W. and Dally, W.J. and Lindholm, E. and Skadron, K.},

   booktitle={Proceeding of the 38th annual international symposium on Computer architecture},

   pages={235–246},

   year={2011},

   organization={ACM}

}

Download Download (PDF)   View View   Source Source   

644

views

Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on massively-threaded processors such as GPUs. First, we examine register file caching to replace accesses to the large main register file with accesses to a smaller structure containing the immediate register working set of active threads. Second, we investigate a two-level thread scheduler that maintains a small set of active threads to hide ALU and local memory access latency and a larger set of pending threads to hide main memory latency. Combined with register file caching, a two-level thread scheduler provides a further reduction in energy by limiting the allocation of temporary register cache resources to only the currently active subset of threads. We show that on average, across a variety of real world graphics and compute workloads, a 6-entry per-thread register file cache reduces the number of reads and writes to the main register file by 50% and 59% respectively. We further show that the active thread count can be reduced by a factor of 4 with minimal impact on performance, resulting in a 36% reduction of register file energy.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: