https://hgpu.org/?p=8047
Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators