4372

Coarse grain computation-communication overlap for efficient application-level checkpointing for GPUs

L.D. Solano-Quinde, B.M. Bode, A.K. Somani
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
IEEE International Conference on Electro/Information Technology (EIT), 2010

@inproceedings{solano2010coarse,

   title={Coarse grain computation-communication overlap for efficient application-level checkpointing for GPUs},

   author={Solano-Quinde, L.D. and Bode, B.M. and Somani, A.K.},

   booktitle={Electro/Information Technology (EIT), 2010 IEEE International Conference on},

   pages={1–5},

   organization={IEEE},

   year={2010}

}

Source Source   

1531

views

Graphics Processing Units (GPUs) are increasingly used to solve non-graphical scientific problems. However, it has been shown that the reliability of the GPUs is a concern because of the occurrence of the soft and hard errors. The checkpoint/restart is the most commonly used technique to achieve fault tolerance in the presence of failures. This work present an application-level checkpoint scheme for systems composed of GPUs. Our scheme exploits the benefits of the divide-and-conquer technique and of the communication-computation overlapping to improve the execution time and checkpoint overhead. By dividing the problem and checkpointing in n subprocesses, we show that our scheme improves the checkpoint overhead by a factor of n. We also show that dividing the problem with finer granularity is not beneficial.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: