https://hgpu.org/?p=15040
Optimizing CUDA Shared Memory Usage