Error Resilience Evaluation on GPGPU Applications

Bo Fang
The University of British Columbia
The University of British Columbia, 2014


   title={Error Resilience Evaluation on GPGPU Applications},

   author={Fang, Bo},




Download Download (PDF)   View View   Source Source   Source codes Source codes




While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, which makes it difficult to achieve representativeness while being time-efficient. This thesis makes three key contributions. First, it presents the design of a fault-injection methodology to evaluate the end-to-end reliability properties of application kernels running on GPUs. Second, it introduces a fault-injection tool that uses real GPU hardware and offers a good balance between the representativeness and the efficiency of the fault injection experiments. Third, it characterizes the error resilience characteristics of twelve GPGPU applications. Last but not least, this thesis provides preliminary insights on correlations between algorithm properties and the measured silent data corruption rates of applications.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: