https://hgpu.org/?p=2441
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method