Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices
Colorado State University, Department of Computer Science, Fort Collins, CO 80523
IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011
@article{hains2011improving,
title={Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices},
author={Hains, D. and Cashero, Z. and Ottenberg, M. and Bohm, W. and Rajopadhye, S.},
year={2011}
}
CUDASW++ is a parallelization of the Smith-Waterman algorithm for CUDA graphical processing units that computes the similarity scores of a query sequence paired with each sequence in a database. The algorithm uses one of two kernel functions to compute the score between a given pair of sequences: the inter-task kernel or the intra-task kernel. We have identified the intra-task kernel as a major bottleneck in the CUDASW++ algorithm. We have developed a new intra-task kernel that is faster than the original intra-task kernel used in CUDASW++. We describe the development of our kernel as a series of incremental changes that provide insight into a number of issues that must be considered when developing any algorithm for the CUDA architecture. We analyze the performance of our kernel compared to the original and show that the use of our intra-task kernel substantially improves the overall performance of CUDASW++ on the order of three to four giga-cell updates per second on various benchmark databases.
November 5, 2011 by hgpu