high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » NMF-mGPU: non-negative matrix factorization on multi-GPU systems

NMF-mGPU: non-negative matrix factorization on multi-GPU systems

Edgardo Mejia-Roa, Daniel Tabas-Madrid, Javier Setoain, Carlos Garcia, Francisco Tirado, Alberto Pascual-Montano

ArTeCS Group, Department of Computer Architecture, Complutense University of Madrid (UCM), Madrid 28040, Spain

BMC Bioinformatics 2015, 16:43, 2015

@article{mejia2015nmf,

title={NMF-mGPU: non-negative matrix factorization on multi-GPU systems},

author={Mej{‘i}a-Roa, Edgardo and Tabas-Madrid, Daniel and Setoain, Javier and Garc{‘i}a, Carlos and Tirado, Francisco and Pascual-Montano, Alberto},

journal={BMC Bioinformatics},

volume={16},

number={1},

pages={43},

year={2015},

publisher={BioMed Central Ltd}

}

Download (PDF)

View

Source

Source codes

Package:

NMF-mGPU

2690

views

BACKGROUND: In the last few years, the Non-negative Matrix Factorization (NMF) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets. However, the computing time required to process large data matrices may become impractical, even for a parallel application running on a multiprocessors cluster. In this paper, we present NMF-mGPU, an efficient and easy-to-use implementation of the NMF algorithm that takes advantage of the high computing performance delivered by Graphics-Processing Units (GPUs). Driven by the ever-growing demands from the video-games industry, graphics cards usually provided in PCs and laptops have evolved from simple graphics-drawing platforms into high-performance programmable systems that can be used as coprocessors for linear-algebra operations. However, these devices may have a limited amount of on-board memory, which is not considered by other NMF implementations on GPU. RESULTS: NMF-mGPU is based on CUDA (Compute Unified Device Architecture), the NVIDIA’s framework for GPU computing. On devices with low memory available, large input matrices are blockwise transferred from the system’s main memory to the GPU’s memory, and processed accordingly. In addition, NMF-mGPU has been explicitly optimized for the different CUDA architectures. Finally, platforms with multiple GPUs can be synchronized through MPI (Message Passing Interface). In a four-GPU system, this implementation is about 120 times faster than a single conventional processor, and more than four times faster than a single GPU device (i.e., a super-linear speedup). CONCLUSIONS: Applications of GPUs in Bioinformatics are getting more and more attention due to their outstanding performance when compared to traditional processors. In addition, their relatively low price represents a highly cost-effective alternative to conventional clusters. In life sciences, this results in an excellent opportunity to facilitate the daily work of bioinformaticians that are trying to extract biological meaning out of hundreds of gigabytes of experimental information. NMF-mGPU can be used "out of the box" by researchers with little or no expertise in GPU programming, in a variety of platforms, such as PCs, laptops, or high-end GPU clusters. NMF-mGPU is freely available at https://github.com/bioinfo-cnb/bionmf-gpu.

Tags: Algorithms, Bioinformatics, Biology, CUDA, Factorization, GPU cluster, MPI, Nonnegative matrix factorization, nVidia, Package, Tesla C1060

February 19, 2015 by hgpu

Rating: 2.5/5. From 3 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

NMF-mGPU: non-negative matrix factorization on multi-GPU systems

Package:

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

NMF-mGPU: non-negative matrix factorization on multi-GPU systems

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)