https://hgpu.org/?p=8618
Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs