https://hgpu.org/?p=1390
How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms