CuMF: scale matrix factorization using just ONE machine with GPUs

Wei Tan, Liangliang Cao
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Workshop on Machine Learning Systems at Neural Information Processing Systems (NIPS), 2015


   title={CuMF: scale matrix factorization using just ONE machine with GPUs},

   author={Tan, Wei and Cao, Liangliang},



Download Download (PDF)   View View   Source Source   



Matrix factorization (MF) is widely used in recommendation systems. We present cuMF, a highly-optimized matrix factorization tool with supreme performance on graphics processing units (GPUs) by fully utilizing the GPU compute power and minimizing the overhead of data movement. Firstly, we introduce a memoryoptimized alternating least square (ALS) method by reducing discontiguous memory access and aggressively using registers to reduce memory latency. Secondly, we combine data parallelism with model parallelism to scale to multiple GPUs. Results show that with up to four GPUs on one machine, cuMF can be up to ten times as fast as those on sizable clusters on large scale problems, and has impressively good performance when solving the largest matrix factorization problem ever reported.
Rating: 1.5/5. From 2 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: