Software Reliability Enhancements for GPU Applications

Si Li, Naila Farooqui, Sudhakar Yalamanchili
School of Electrical and Computer Engineering, Georgia Institute of Technology, USA
Sixth Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2013), held in conjunction with the 8th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), 2013


   title={Software Reliability Enhancements for GPU Applications},

   author={Li, S. and Farooqui, N. and Yalamanchili, S.},



Download Download (PDF)   View View   Source Source   



As the role of highly-parallel accelerators becomes more important in high performance computing, so does the need to ensure their reliable operation. In applications where precision and correctness is a necessity, bit-level reliable operation is required. While there exist mechanisms for error detection and correction, the cost-effective implementation in massively parallel accelerators is still an active area of research. In this paper we present an alternative software based approach for improving the reliability of massively parallel bulk synchronous processors such as modern GPUs. Specfifically, we propose a set of software reliability enhancements via transparent code patching of GPU applications. Reliability enhancements can be applied selectively at runtime, customized by the user, and transparent to the application. Runtime overhead ranges from 1-737% depending on the nature of the enhancement. We provide an analysis of benefits and limitations.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: