CuMF: scale matrix factorization using just ONE machine with GPUs
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Workshop on Machine Learning Systems at Neural Information Processing Systems (NIPS), 2015
@article{tan2015cumf,
title={CuMF: scale matrix factorization using just ONE machine with GPUs},
author={Tan, Wei and Cao, Liangliang},
year={2015}
}
Matrix factorization (MF) is widely used in recommendation systems. We present cuMF, a highly-optimized matrix factorization tool with supreme performance on graphics processing units (GPUs) by fully utilizing the GPU compute power and minimizing the overhead of data movement. Firstly, we introduce a memoryoptimized alternating least square (ALS) method by reducing discontiguous memory access and aggressively using registers to reduce memory latency. Secondly, we combine data parallelism with model parallelism to scale to multiple GPUs. Results show that with up to four GPUs on one machine, cuMF can be up to ten times as fast as those on sizable clusters on large scale problems, and has impressively good performance when solving the largest matrix factorization problem ever reported.
December 6, 2015 by hgpu