https://hgpu.org/?p=25026
Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation