high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Implementing an efficient method of check-pointing on CPU-GPU

Implementing an efficient method of check-pointing on CPU-GPU

Harsha Sutaone, Sharath Prasad, Sumanth Suraneni

Computer-Aided Engineering, College of Engineering, University of Wisconsin-Madison

University of Wisconsin-Madison, 2014

BibTeX

Download (PDF)

View

Source

1592

views

In this paper, we describe the design, implementation, verification and analysis of providing fine-grained architectural support for efficient check-pointing and restart on a CPU-GPU heterogeneous system. We use Multi2sim, a simulator, capable of emulating a CPU-GPU system. The simulator is capable of emulating a 32 bit x86 CPU that launches OpenCl Kernels on the GPU model emulating the Advanced Micro Devices (AMD) Southern Islands Architecture. We choose this configuration since this is one of the only known commercial GPU architectures. This helps demonstrate that the architectural changes proposed in this paper are feasible with low complexity on real GPU architectures. The AMDAPP benchmark suite with OpenCl kernels are used as tests for verification and analysis. Our implementation leverages the underlying micro-architecture and the execution model to save only the required state, at a much finer granularity, hence reducing the overhead of checkpoint and restart. The design is verified for correctness by comparing the traces generated by checkpoint and restart with golden execution traces for each of the AMDAPP workloads. We then estimate the size of the files generated during checkpoint and restart to compare them with the size of the complete Kernel state of the GPU at any given instant. Our design significantly reduces the memory overhead. Even though this paper does not discuss timing overhead, our design does not make drastic changes to the execution model, so we estimate low timing overhead.

Tags: ATI, Computer science, Heterogeneous systems, OpenCL

May 6, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Implementing an efficient method of check-pointing on CPU-GPU

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Implementing an efficient method of check-pointing on CPU-GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)