MaxSSmap: A GPU program for short read mapping with the maximum scoring subsequence
Department of Computer Science, New Jersey Institute of Technology, Newark
arXiv:1403.2136 [q-bio.GN], (10 Mar 2014)
@article{2014arXiv1403.2136R,
author={Roshan, U.},
title={"{MaxSSmap: A GPU program for short read mapping with the maximum scoring subsequence}"},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1403.2136},
primaryClass={"q-bio.GN"},
keywords={Quantitative Biology – Genomics},
year={2014},
month={mar},
adsurl={http://adsabs.harvard.edu/abs/2014arXiv1403.2136R},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
Exact short read mapping to whole genomes with the Smith-Waterman algorithm is computationally expensive yet highly accurate when aligning reads with mismatches and gaps. We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to mainstream approaches MaxSSmap identifies a local region of the genome followed by exact alignment. Instead of using hash tables or Burrows-Wheeler in the first part, MaxSSmap calculates maximum scoring subsequence score between the read and disjoint fragments of the genome in parallel on a GPU and selects the highest scoring fragment for exact alignment. We evaluate MaxSSmap’s accuracy and runtime when mapping simulated E.coli and human reads of 10% to 30% mismatches with gaps of various lengths to the E.coli genome and human chromosome one respectively. We show that MaxSSmap attains comparable high accuracy and low error to fast Smith-Waterman programs yet has much lower runtimes. We also show that MaxSSmap can map reads rejected by fast mappers with high accuracy and low error much faster than if Smith-Waterman were used. The MaxSSmap source code is freely available.
March 12, 2014 by hgpu