high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Biology » cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU

Jing Zhang, Hao Wang, Heshan Lin, Wu-chun Feng

Dept. of Computer Science, Virginia Tech

IEEE International Parallel and Distributed Processing Symposium, 2014

@InProceedings{zhang-cublastp-ipdps14,

author={Zhang, Jing and Wang, Hao and Lin, Heshan and Feng, Wu-chun},

title={cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU},

booktitle={IEEE International Parallel and Distributed Processing Symposium},

address={Phoenix, Arizona, USA},

month={May},

year={2014}

}

Download (PDF)

View

Source

2660

views

BLAST, short for Basic Local Alignment Search Tool, is a fundamental algorithm in the life sciences that compares biological sequences. However, with the advent of next-generation sequencing (NGS) and increase in sequence read-lengths, whether at the outset or downstream from NGS, the exponential growth of sequence databases is arguably outstripping our ability to analyze the data. Though several recent studies have utilized the graphics processing unit (GPU) to speedup the BLAST algorithm for searching protein sequences (i.e., BLASTP), these studies used coarse-grained parallel approaches, where one sequence alignment is mapped to only one thread. Moreover, due to the irregular memory access patterns in BLASTP, there remain significant challenges to map the most time-consuming phases (i.e., hit detection and ungapped extension) to the GPU using a fine-grained multithreaded approach. To address the above issues, we propose cuBLASTP, an efficient fine-grained BLASTP implementation for the GPU using CUDA. Our cuBLASTP realization encompasses many research contributions, including (1) memory-access reordering to reorder hits from column-major order to diagonal-major order, (2) position-based indexing to map a hit with a packed data structure to a bin, (3) aggressive hit filtering to eliminate hits beyond the threshold distance along the diagonal, (4) diagonal-based parallelism and hit-based parallelism for ungapped extension to extend sequences with different lengths in databases, and (5) hierarchical buffering to reduce memory-access overhead for the core data structures. The experimental results show that on a NVIDIA Kepler GPU, cuBLASTP delivers up to a 5.0-fold speedup over sequential FSA-BLAST and a 3.7-fold speedup over multithreaded NCBI-BLAST for the overall program execution. In addition, compared with GPU-BLASTP (the fastest GPU implementation of BLASTP to date), cuBLASTP achieves up to a 2.8-fold speedup for the kernel execution on the GPU and a 1.8-fold speedup for the overall program execution.

Tags: Bioinformatics, Biology, CUDA, Next-Generation sequencing, nVidia, Sequence alignment, Tesla K20

May 23, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)