https://hgpu.org/?p=15739
LightScan: Faster Scan Primitive on CUDA Compatible Manycore Processors