Software-Based ECC for GPUs
Tokyo Institute of Technology, JST CREST
Symposium on Application Accelerators in High Performance Computing, 2009 (SAAHPC’09)
@conference{maruyama2009software,
title={Software-based ECC for GPUs},
author={Maruyama, N. and Nukada, A. and Matsuoka, S.},
booktitle={2009 Symposium on Application Accelerators in High Performance Computing (SAAHPC’09)},
year={2009}
}
Commodity off-the-shelf GPUs lack error checking mechanisms for graphics memory, whereas conventional HPC platforms have used hardware-based ECC for DRAMs. To alleviate this reliability concern, we propose a software-based ECC for GPGPU applications. We add small program codes to normal CUDA programs that compute ECCs for data residing in graphics memory so that transient bit-flips can be detected or masked. Preliminary performance studies with 3-D FFT and the N-body problem show that error checking using ECC can take 200% and 7% of overhead, respectively. We discuss that performance overheads are derived from the cost of ECC computation on GPUs.
February 21, 2011 by hgpu