https://hgpu.org/?p=987
Analyzing CUDA workloads using a detailed GPU simulator