https://hgpu.org/?p=28880
Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay