https://hgpu.org/?p=8120
Performance Comparison Between Cg-based and CUDA-based Matrix Multiplications