https://hgpu.org/?p=7630
GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs