https://hgpu.org/?p=11321
A Performance Criteria for parallel Computation on basis of block size using CUDA Architecture