https://hgpu.org/?p=6311
Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units