G-SNPM – A GPU-based SNP mapping tool
Institute for Biomedical Technologies, National Research Council, Milano, Italy
EMBnet.journal Bioinformatics in Action, Vol. 18, 2012
@article{orro2012g,
title={G-SNPM-A GPU-based SNP mapping tool},
author={Orro, A. and Manconi, A. and Manca, E. and Armano, G. and Milanesi, L.},
journal={EMBnet. journal},
volume={18},
number={B},
pages={pp–138},
year={2012}
}
MOTIVATION AND OBJECTIVES: In genotyping analysis often researchers need to merge together genetic datasets coming from different genotyping platforms that use different sets of Single Nucleotide Polymorphisms (SNPs) to represent genetic polymorphisms. In order to do this, it is necessary to know the exact position of a SNP in a chromosome and update this information when new builds of the reference genome are available. In this work, we present G-SNPM (GPU SNP Mapping) a GPU-based tool to map SNPs on a genome. METHODS: G-SNPM is a tool that maps a short sequence (read) representative of a SNP against a reference DNA sequence in order to find the absolute position of the SNP in that sequence. Several tools have been devised to perform short-read mapping. Without aiming to be exhaustive, we can cite some solutions: MAQ (Li and Durbin, 2008), RMAP (Smith et al., 2008; Smith et al., 2009), Bowtie (Langmead et al., 2009), BWA (Li and Durbin, 2009), CloudBurst (Schatz, 2009), and SHRiMP (Rumble et al., 2009). A comparative study aimed at assessing the accuracy and the runtime performance of six state-of-the-art next-generation sequencing read alignment tools (Ruffalo et al., 2011) highlighted that among all SOAPv2 (Li et al., 2009) is the one that shows the higher accuracy. Recently, it has been proposed SOAPv3 (Liu et al., 2012) the GPU-based evolution of the SOAPv2 aligner. Experimental results shown that SOAPv3 outperforms notably both BWA and Bowtie. When tested to align millions of 100-bp read pairs to the human genome it resulted at least 7.5 times faster than BWA, and 20 times faster than Bowtie. Moreover, SOAPv3 that not exploits heuristics is able to align correctly slightly more reads than BWA and Bowtie. The current release of SOAPv3 supports alignments with up to four mismatches while it does not support indels. In G-SNPM each SNP is mapped on its related chromosome by means an automatic three stage pipeline. In the first stage, G-SNPM uses SOAPv3 to parallel align on a reference chromosome its related reads representative of a SNP. Due to the fact that SOAPv3 does not support indels, it might not be able to align some reads. Then, in the second stage G-SNPM uses another short-read mapping tool to align the unmapped reads. In particular, in this stage it is used SHRiMP which exploits specialized vector computing hardware to speed-up the dynamic programming algorithm of Smith-Waterman. Finally, in the third stage, G-SNPM analyses the alignments of the reads mapped by SOAPv3 and SHRiMP to calculate the absolute position of each SNP. An output file is generated which for each SNP reports its name, the related chromosome, the original SNP position, and the mapped SNP position. Moreover, information about the alignment as the strand, number of mismatches, and indels are also provided (see Figure 1). In G-SNPM reference DNA sequences are accepted in standard FASTA format, whereas SNPs must be represented through two files: a FASTA file with the representative reads of the SNPs, and another flat file with information about the SNP, in particular the original absolute SNP position and its alleles. Currently, automatic generation of these files is provided for SNP probes of the Illumina Chip. G-SNPM analyses Illumina files to automatically generate the previous described files for each chromosome.
November 14, 2012 by hgpu