https://hgpu.org/?p=7213
Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units