https://hgpu.org/?p=1309
Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms