https://hgpu.org/?p=11708
CUDA Implementation of a Lattice Boltzmann Method and Code Optimization