https://hgpu.org/?p=6813
Architecture-Aware Mapping and Optimization on a 1600-Core GPU