11986

Implementing an efficient method of check-pointing on CPU-GPU

Harsha Sutaone, Sharath Prasad, Sumanth Suraneni
Computer-Aided Engineering, College of Engineering, University of Wisconsin-Madison
University of Wisconsin-Madison, 2014

@article{sutaone2014implementing,

   title={Implementing an efficient method of check-pointing on CPU-GPU},

   author={Sutaone, Harsha and Prasad, Sharath and Suraneni, Sumanth},

   year={2014}

}

Download Download (PDF)   View View   Source Source   

1273

views

In this paper, we describe the design, implementation, verification and analysis of providing fine-grained architectural support for efficient check-pointing and restart on a CPU-GPU heterogeneous system. We use Multi2sim, a simulator, capable of emulating a CPU-GPU system. The simulator is capable of emulating a 32 bit x86 CPU that launches OpenCl Kernels on the GPU model emulating the Advanced Micro Devices (AMD) Southern Islands Architecture. We choose this configuration since this is one of the only known commercial GPU architectures. This helps demonstrate that the architectural changes proposed in this paper are feasible with low complexity on real GPU architectures. The AMDAPP benchmark suite with OpenCl kernels are used as tests for verification and analysis. Our implementation leverages the underlying micro-architecture and the execution model to save only the required state, at a much finer granularity, hence reducing the overhead of checkpoint and restart. The design is verified for correctness by comparing the traces generated by checkpoint and restart with golden execution traces for each of the AMDAPP workloads. We then estimate the size of the files generated during checkpoint and restart to compare them with the size of the complete Kernel state of the GPU at any given instant. Our design significantly reduces the memory overhead. Even though this paper does not discuss timing overhead, our design does not make drastic changes to the execution model, so we estimate low timing overhead.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: