high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

Wei Tan, Shiyu Chang, Liana Fong, Cheng Li, Zijun Wang, Liangliang Cao

IBM T. J. Watson Research Center

arXiv:1808.03843 [cs.DC], (11 Aug 2018)

@article{tan2018matrix,

title={Matrix Factorization on GPUs with Memory Optimization and Approximate Computing},

author={Tan, Wei and Chang, Shiyu and Fong, Liana and Li, Cheng and Wang, Zijun and Cao, Liangliang},

year={2018},

month={aug},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

cumf_als: CUDA Matrix Factorization Library with Alternating Least Square (ALS)

1876

views

Matrix factorization (MF) discovers latent features from observations, which has shown great promises in the fields of collaborative filtering, data compression, feature extraction, word embedding, etc. While many problem-specific optimization techniques have been proposed, alternating least square (ALS) remains popular due to its general applicability e.g. easy to handle positive-unlabeled inputs, fast convergence and parallelization capability. Current MF implementations are either optimized for a single machine or with a need of a large computer cluster but still are insufficient. This is because a single machine provides limited compute power for large-scale data while multiple machines suffer from the network communication bottleneck. To address the aforementioned challenge, accelerating ALS on graphics processing units (GPUs) is a promising direction. We propose the novel approach in enhancing the MF efficiency via both memory optimization and approximate computing. The former exploits GPU memory hierarchy to increase data reuse, while the later reduces unnecessary computing without hurting the convergence of learning algorithms. Extensive experiments on large-scale datasets show that our solution not only outperforms the competing CPU solutions by a large margin but also has a 2x-4x performance gain compared to the state-of-the-art GPU solutions. Our implementations are open-sourced and publicly available.

Tags: Algorithms, Compression, Computer science, CUDA, Factorization, nVidia, nVidia GeForce GTX Titan X, Package

August 19, 2018 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

Package:

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)