https://hgpu.org/?p=16652
CuMF_SGD: Fast and Scalable Matrix Factorization