Cost-Effective Soft-Error Protection for SRAM-Based Structures in GPGPUs
Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA
Proceedings of the ACM International Conference on Computing Frontiers (CF ’13), 2013
@inproceedings{tan2013cost,
title={Cost-effective soft-error protection for SRAM-based structures in GPGPUs},
author={Tan, Jingweijia and Li, Zhi and Fu, Xin},
booktitle={Proceedings of the ACM International Conference on Computing Frontiers},
pages={29},
year={2013},
organization={ACM}
}
The general-purpose computing on graphics processing units (GPGPUs) are increasingly used to accelerate parallel applications. This makes reliability a growing concern in GPUs as they are originally designed for graphics processing with relaxed requirements for execution correctness. With CMOS processing technologies continuously scaling down to the nano-scale, on-chip soft error rate (SER) has been predicted to increase exponentially. GPGPUs with hundreds of cores integrated into a single chip are prone to manifest high SER. This paper aims to enhance the GPGPU reliability in light of soft errors. We leverage the GPGPU microarchitecture characteristics, and propose energy-efficient protection mechanisms for two typical SRAM-based structures (i.e. instruction buffer and registers) which suffer high susceptibility. We develop Similarity-AWare Protection (SAWP) scheme that leverages the instruction similarity to provide the nearfull ECC protection to the instruction buffer with quite little area and power overhead. Based on the observation that shared memory usually exhibits low utilization, we propose SHAred memory to Register Protection (SHARP) scheme, it intelligently leverages shared memory to hold the ECCs of registers. Experimental results show that our techniques have the strong capability of substantially improving the structure vulnerability, and significantly reducing the power consumption compared to the full ECC protection mechanism.
June 13, 2013 by hgpu