https://hgpu.org/?p=8729
uBench: Performance Impact of CUDA Block Geometry