Performance in GPU Architectures: Potentials and Distances

Ahmad Lashgar, Amirali Baniasadi
School of Electrical and Computer Engineering, College of Engineering, University of Tehran
9th Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD), 2011


   title={Performance in GPU Architectures: Potentials and Distances},

   author={Lashgar, A. and Baniasadi, A.},



Download Download (PDF)   View View   Source Source   



GPUs can execute up to one TFLOPs at their peak performance. This peak performance, however, is rarely reached as a result of resource underutilization. Three parameters contribute to this inefficiency: branch divergence, memory access delays and limited workload parallelism. To this end we suggest machine models to estimate performance gain potentials obtainable by eliminating each performance degrading parameter. Such estimates indicate how much improvement designers could expect by investing in different GPU subsections. Moreover, our models show how much performance is lost compared to an ideal GPU as a result of non-ideal GPU components. We conclude that memory is by far the most important parameter among the three issues impacting performance. We show that in the presence of an ideal memory system, GPU performance can reach within 59% of an ideal system. Meantime, using an ideal control-flow mechanism or unlimited execution resources does not come with the same impact. In fact, as we show in this study, an ideal control-flow could harm performance as the result of increasing pressure on the memory system. In addition, we study our models under GPUs exploiting aggressive memory systems and well-equipped Stream Multiprocessors. We investigate how previously suggested control-flow solutions impact performancedegrading issues and make recommendation to enhance control-flow mechanisms.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: