11302

GPU-Qin: A Methodology for Evaluating the Error Resilience of GPGPU Applications

Bo Fang, Karthik Pattabiraman, Matei Ripeanu, Sudhanva Gurumurthi
Department of Electrical and Computer Engineering, University of British Columbia
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14), 2014

@article{fang2014gpu,

   title={GPU-Qin: A Methodology for Evaluating the Error Resilience of GPGPU Applications},

   author={Fang, Bo and Pattabiraman, Karthik and Ripeanu, Matei and Gurumurthi, Sudhanva},

   year={2014}

}

Download Download (PDF)   View View   Source Source   

1873

views

While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, which makes it difficult to achieve representativeness while being time-efficient. This paper makes three key contributions. First, it presents the design of a fault-injection methodology to evaluate end-to-end reliability properties of application kernels running on GPUs. Second, it introduces a fault-injection tool that uses real GPU hardware and offers a good balance between the representativeness and the efficiency of the fault injection experiments. Third, this paper characterizes the error resilience characteristics of twelve GPGPU applications.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: