https://hgpu.org/?p=15573
Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs