https://hgpu.org/?p=905
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA