https://hgpu.org/?p=3250
Implementation of algorithms with a fine-grained parallelism on GPUs