https://hgpu.org/?p=4372
Coarse grain computation-communication overlap for efficient application-level checkpointing for GPUs