https://hgpu.org/?p=15909
A Practical Performance Model for Compute and Memory Bound GPU Kernels