29389

VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing

Jaebeom Jeon, Minseong Gil, Junsu Kim, Jaeyong Park, Gunjae Koo, Myung Kuk Yoon, Yunho Oh
Korea University, Seoul, South Korea
Proceedings of the 53rd International Conference on Parallel Processing (ICPP’24), 2024

@inproceedings{jeon2024vitbit,

   title={VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing},

   author={Jeon, Jaebeom and Gil, Minseong and Kim, Junsu and Park, Jaeyong and Koo, Gunjae and Yoon, Myung Kuk and Oh, Yunho},

   booktitle={Proceedings of the 53rd International Conference on Parallel Processing},

   pages={1012–1021},

   year={2024}

}

Download Download (PDF)   View View   Source Source   

452

views

The rapid advancement of Artificial Intelligence (AI) necessitates significant enhancements in the energy efficiency of Graphics Processing Units (GPUs) for Deep Neural Network (DNN) workloads. Such a challenge is particularly critical for embedded GPUs, which operate within stringent power constraints. Traditional GPU architectures, designed to support a limited set of numeric formats, face challenges in meeting the diverse requirements of modern AI applications. These applications demand support for various numeric formats to optimize computational speed and efficiency. This paper proposes VitBit, a novel software technique designed to overcome these limitations by enabling efficient processing of arbitrary integer format values, especially those 8 bits or fewer, which are increasingly prevalent in AI workloads. VitBit introduces two key innovations: the packing of arbitrary integer formats for parallel computation and the simultaneous execution of Tensor cores, INT and FP (Integer and Floating-Point) CUDA cores. This approach leverages the architectural features of modern GPUs, such as those based on NVIDIA Ampere architecture, which allows concurrent operation of FP32 and INT32 cores at full throughput. Our evaluation of VitBit on NVIDIA Jetson AGX Orin demonstrates substantial improvements in arithmetic density and peak throughput, achieving up to a 22% reduction in execution time for benchmark AI workloads without compromising inference accuracy. VitBit effectively bridges the gap between current hardware capabilities and the computational demands of AI, offering a scalable and cost-effective method for enhancing GPU performance in AI applications.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: