Energy-efficient mechanisms for managing thread context in throughput processors
The University of Texas at Austin, Austin, TX, USA
Proceeding of the 38th annual international symposium on Computer architecture, ISCA ’11
@inproceedings{gebhart2011energy,
title={Energy-efficient mechanisms for managing thread context in throughput processors},
author={Gebhart, M. and Johnson, D.R. and Tarjan, D. and Keckler, S.W. and Dally, W.J. and Lindholm, E. and Skadron, K.},
booktitle={Proceeding of the 38th annual international symposium on Computer architecture},
pages={235–246},
year={2011},
organization={ACM}
}
Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on massively-threaded processors such as GPUs. First, we examine register file caching to replace accesses to the large main register file with accesses to a smaller structure containing the immediate register working set of active threads. Second, we investigate a two-level thread scheduler that maintains a small set of active threads to hide ALU and local memory access latency and a larger set of pending threads to hide main memory latency. Combined with register file caching, a two-level thread scheduler provides a further reduction in energy by limiting the allocation of temporary register cache resources to only the currently active subset of threads. We show that on average, across a variety of real world graphics and compute workloads, a 6-entry per-thread register file cache reduces the number of reads and writes to the main register file by 50% and 59% respectively. We further show that the active thread count can be reduced by a factor of 4 with minimal impact on performance, resulting in a 36% reduction of register file energy.
September 7, 2011 by hgpu