Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs

hgpu.org » Applications » Biology » Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs

Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs

K. Dohi, K. Benkridt, C. Ling, T. Hamada, Y. Shibata

Dept. of Comput. & Inf. Sci., Nagasaki Univ., Nagasaki, Japan

21st IEEE International Conference on Application-specific Systems Architectures and Processors (ASAP), 2010

DOI:10.1109/ASAP.2010.5540796

@inproceedings{dohi2010highly,

title={Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs},

author={Dohi, K. and Benkridt, K. and Ling, C. and Hamada, T. and Shibata, Y.},

booktitle={Application-specific Systems Architectures and Processors (ASAP), 2010 21st IEEE International Conference on},

pages={29–36},

year={2010},

organization={IEEE}

}

Source

1603

views

This paper describes a multi-threaded parallel design and implementation of the Smith-Waterman (SW) algorithm on graphic processing units (GPUs) with NVIDIA corporation’s Compute Unified Device Architecture (CUDA). Central to this is a divide and conquer approach which divides the computation of a whole pairwise sequence alignment matrix into multiple sub-matrices (or parallelograms) each running efficiently on the available hardware resources of the GPU in hand, with temporary intermediate data stored in global memory. Moreover, we use thread warps and padding techniques in order to decrease the cost of thread synchronization, as well as loop unrolling in order to reduce the cost of conditional branches. While intermediate data is stored in global memory for large queries, the most inner loop in our implementation will only access shared memory and registers. As a result of these optimizations, our implementation of the SW algorithm achieves a throughput ranging between 9.09 GCUPS (Giga Cell Update per Second) and 12.71 GCUPS on a single-GPU version, and a throughput between 29.46 GCUPS and 43.05 GCUPS on a quad-GPU platform. Compared with the best GPU implementation of the SW algorithm reported to date, our implementation achieves up to 46 % improvement in speed. The source code of our implementation is available in the public domain for Bioinformaticians to benefit from its performance.

Tags: Bioinformatics, Biology, CUDA, nVidia, Sequence alignment, Smith-Waterman algorithm

May 31, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org