high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Biology » Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing

Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing

Alhadi Bustamam, Kevin Burrage, Nicholas A. Hamilton

Inst. for Mol. Biosci., Univ. of Queensland, Brisbane, QLD, Australia

Ninth International Workshop on, and High Performance Computational Systems Biology, Second International Workshop on Parallel and Distributed Methods in Verification, 2010

DOI:10.1109/PDMC-HiBi.2010.23

@inproceedings{bustamam2010fast,

title={Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing},

author={Bustamam, A. and Burrage, K. and Hamilton, N.A.},

booktitle={2010 Ninth International Workshop on Parallel and Distributed Methods in Verification/2010 Second International Workshop on High Performance Computational Systems Biology},

pages={116–125},

year={2010},

organization={IEEE}

}

Source

1615

views

Markov clustering is becoming a key algorithm with in bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factorin applications. Meanwhile, Graphics Processing (GPU)computing, which uses a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. This paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallel computing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilizing ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallel computing environments, such as Message Passing Interface (MPI). Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published(relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4 times and 9 were obtained, respectively. On the Human Protein Reference Database, the sp–eed of clustering of 19599 proteins was improved by a factor of 7 by the GPU algorithm. However, on artificially generated densely connected networks with 1600 to 4800 nodes, speedups by a factor in the range 40 to 120 times were readily obtained. As the results show, in all cases the GPU implementation is significantly faster than the original MCL running on CPU. Such approaches are allowing large-scale parallel computation on off-the-shelf desktop machines that were previously only possible on super-computing architectures, and have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data.

Tags: Bioinformatics, Biology, CUDA, MPI, nVidia, Sparse matrix

May 30, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing

Share this:

Recent source codes

Most viewed papers (last 30 days)