2713

On testing GPU memory for hard and soft errors

Guochun Shi, Jeremy Enos, Michael Showerman, Volodymyr Kindratenko
National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Proc. Symposium on Application Accelerators in High-Performance Computing – SAAHPC’09, 2009

@conference{shi2009testing,

   title={On testing GPU memory for hard and soft errors},

   author={Shi, G. and Enos, J. and Showerman, M. and Kindratenko, V.},

   booktitle={Proc. Symposium on Application Accelerators in High-Performance Computing},

   year={2009}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

1649

views

NVIDIA GPUs are becoming increasingly popular in scientific computation as a way to accelerate the execution of computationally demanding codes. The graphics memory used in GPUs is not protected against soft errors that may be caused by cosmic radiation and thus is a source of concern for the scientific computing community. In this short paper we report on an attempt to test GPU memory for both permanent memory errors due to manufacturing defects and prolonged use and soft errors due to single radiation events. We present a new GPU memory test methodology and show results of error measurements on two large GPU clusters.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: