https://hgpu.org/?p=8798
Inter-Warp Instruction Temporal Locality in Deep-Multithreaded GPUs