https://hgpu.org/?p=9476
Analysis of Parallel Montgomery Multiplication in CUDA