https://hgpu.org/?p=13889
A Survey of Techniques for Modeling and Improving Reliability of Computing Systems