https://hgpu.org/?p=15043
CuMF: scale matrix factorization using just ONE machine with GPUs