high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Biology » GPU-accelerated protein family identification for metagenomics

GPU-accelerated protein family identification for metagenomics

Changjun Wu, Ananth Kalyanaraman

Xerox Innovation Group, Xerox Research Center, Webster, NY, USA

IEEE 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum, 2013

@article{wu2013gpu,

title={GPU-accelerated protein family identification for metagenomics},

author={Wu, Changjun and Kalyanaraman, Ananth},

year={2013}

}

Download (PDF)

View

Source

2339

views

The clustering of putative protein/Open Reading Frame (ORF) sequences available from large-scale metagenomics survey projects is a core analytical function that has led to the identification and characterization of novel protein families of environmental microbial communities. The implementation of this function, however, is currently challenged not only by data size but also by data complexity. In this paper, we present a CPU-GPU implementation of a randomized graph clustering heuristic called Shingling, which was originally developed by Gibson et al. Our implementation uses the CPU and GPU for different stages of computation, using GPUs for the most time-consuming steps. Experimental results of a 2M ocean metagenomics data set obtained from the Sorcerer II Global Ocean Sampling project show that our new implementation is able to achieve a ~7X speedup over our serial implementation without using asynchronous CPUGPU communication, with the GPU part alone contributing to over ~374X speedup in the accelerated part. Qualitative evaluation of the 2M data set shows that our method is able to improve sensitivity of clustering over existing methods, and is more successful in recruiting more sequences into the clustering without impacting the overall specificity. As a demonstration of a large scale run, we were able to cluster a real world homology graph, containing 11M vertices and 640M edges, and constructed from sequences of an ongoing Pacific Ocean metagenomics survey project, in about 94 minutes.

Tags: Biology, Clustering, CUDA, Graph theory, nVidia, Tesla K20

May 25, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

GPU-accelerated protein family identification for metagenomics

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

GPU-accelerated protein family identification for metagenomics

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)