Accurate Sequence Alignment using Distributed Filtering on GPU Clusters
University of Illinois at Urbana-Champaign
University of Illinois at Urbana-Champaign, Technical Report, 2011
@article{farivar2011accurate,
title={Accurate Sequence Alignment using Distributed Filtering on GPU Clusters},
author={Farivar, R. and Venkataraman, S. and Li, Y. and Chan, E.M. and Verma, A. and Campbell, R.},
year={2011}
}
Advent of next generation gene sequencing machines has led to computationally intensive alignment problems that can take many hours on a modern computer. Considering the fast increasing rate of introduction of new short sequences that are sequenced, the large number of existing sequences and inaccuracies in the sequencing machines, short sequence alignment has become a major challenge in High Performance Computing. In practice gaps as well as mismatches are found in genomic sequences, resulting in an edit distance problem. In this paper we describe the design of a distributed filter, based on shifted masks, to quickly reduce the number of potential matches in the presence of gaps and mismatches. Furthermore, we present a hybrid dynamic programming method, optimized for GPGPU targets, to process the filter outputs and find the accurate number of insertions, deletions and mismatches. Finally we present results from experiments performed on an NCSA cluster of 128 GPU units using the Hadoop framework.
December 26, 2011 by hgpu