28926

Adding fault tolerance to OpenCL: Through redundant heterogeneous computing

Robin Alexander Bijl
TU Delft Electrical Engineering, Mathematics and Computer Science
Delft University of Technology, 2023

@article{bijl2023adding,

   title={Adding fault tolerance to OpenCL: Through redundant heterogeneous computing},

   author={Bijl, Robin},

   year={2023}

}

Download Download (PDF)   View View   Source Source   

366

views

The ever-increasing demand for computing has led to the need for specialized heterogeneous hardware, and the frameworks required to utilize them. Besides the traditional central processing units, more and more programs will make use of specialized hardware to accelerate computations. However, the increase in computing also leads to shorter mean time between failures. In this thesis, we apply fault tolerance to Portable Computing Language (PoCL), an open-source implementation of the OpenCL standard. We show that our solution is easy to apply to existing programs making use of PoCL/OpenCL and is able to greatly reduce the total number of errors visible to the end user. Our solution can be used on any device supported by PoCL and provides a low overhead, given that the hardware requirements are met.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: