https://hgpu.org/?p=9098
A journey from single-GPU to optimized multi-GPU SPH with CUDA