https://hgpu.org/?p=16870
Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization