Fault table generation using Graphics Processing Units
Dept. of ECE, Texas A&M Univ., College Station, TX, USA
IEEE International High Level Design Validation and Test Workshop, 2009. HLDVT 2009
@inproceedings{gulati2009fault,
title={Fault table generation using graphics processing units},
author={Gulati, K. and Khatri, S.P.},
booktitle={High Level Design Validation and Test Workshop, 2009. HLDVT 2009. IEEE International},
pages={60–67},
year={2009},
organization={IEEE}
}
In this paper, we explore the implementation of fault table generation on a Graphics Processing Unit (GPU). A fault table is essential for fault diagnosis and fault detection in VLSI testing and debug. Generating a fault table requires extensive fault simulation, with no fault dropping, and is extremely expensive from a computational standpoint. Fault simulation is inherently parallelizable, and the large number of threads that a GPU can operate on in parallel can be employed to accelerate fault simulation, and thereby accelerate fault table generation. Our approach, called GFTABLE, employs a pattern parallel approach which utilizes both bit-parallelism and thread-level parallelism. Our implementation is a significantly modified version of FSIM, which is pattern parallel fault simulation approach for single core processors. Like FSIM, GFTABLE utilizes critical path tracing and the dominator concept to reduce runtime. Further modifications to FSIM allow us to maximally harness the GPU’s huge memory bandwidth and high computational power. Our approach does not store the circuit (or any part of the circuit) on the GPU. Efficient parallel reduction operations are implemented in our implementation of GFTABLE. We compare our performance to FSIM*, which is FSIM modified to generate a fault table on a single core processor. Our experiments indicate that GFTABLE, implemented on a single NVIDIA GeForce GTX 280 GPU card, can generate a fault table for 0.5 million test patterns on average 7.85x faster when compared with FSIM*. With the NVIDIA Tesla server, our approach would be potentially 34.82x faster.
September 5, 2011 by hgpu