Massive parallelization of combinatorial statistical genetics analyses porting machine learning methods on general purpose graphics processing units (GPU)
Aus Hull, Kanada
Technischen Universitat Berlin, 2012
@phdthesis{kam2012massive,
title={Massive parallelization of combinatorial statistical genetics analyses porting machine learning methods on general purpose graphics processing units (GPU)},
author={Kam-Thong, T.},
year={2012},
school={Universit{"a}tsbibliothek}
}
Recent advances in sequencing technology and automated phenotyping render it possible to study the relationship between genotype and phenotype at an unprecedented level of detail. While mapping phenotypes to single loci in the genome is a standard technique in Statistical Genetics, the problem of epistasis search, that is mapping phenotypes to pairs of loci, remains computationally infeasible in practice. This is problematic, as epistatic interactions between loci are expected to contribute significantly to phenotypic variance. By making use of the computational power of graphics cards, we enable epistasis detection via linear and logistic regression on a single desktop machine. As the use of graphics processing units (GPUs) is becoming synonymous with an economical and ease-of-access parallel computing option, it is spawning many innovative projects in several fields of study. Our group has successfully developed new tools with the aim of using the multiple cores available on GPUs to solve the epistasis problem. A dedicated kernel code running on GPUs helps to unlock the parallel computational power of these devices and compute the statistical scores of all possible second order interactions. The GPU-bound programs have shown to outperform not only standard single CPU-core based approaches but also tools designed for multiple CPU cores by up to two orders of magnitude. The tools will be of great assistance to researchers intent on performing exhaustive epistasis searches. In particular, our implementations enable to conduct a systematic epistasis detection study on the large host of previously published Genome-wide association studies (GWAS) data, including Wellcome Trust Case Control Consortium (WTCCC). The vision of researchers employing no more than a single desktop computer to evaluate the statistical significance of interactions of biological inputs in the order of hundred of billions has become a reality. This will in turn help drive down costs and increase innovation in this field of study.
September 16, 2012 by hgpu