https://hgpu.org/?p=1379
Fast Histograms using Adaptive CUDA Streams