GPU-accelerated Computation for Statistical Analysis of the Next-Generation Sequencing Data

Yilan Liu
Worcester Polytechnic Institute
Worcester Polytechnic Institute, 2014


   title={GPU-accelerated Computation for Statistical Analysis of the Next-Generation Sequencing Data},

   author={Liu, Yilan},



Download Download (PDF)   View View   Source Source   



The next-generation sequencing technologies are pouring big data and pushing the frontier of life sciences toward new territories that were never imagined before. However, such big data impose great computational challenges to statistical analysis of these data. It is important to utilize Graphics Processing Unit (GPU)’s large throughput and massive parallelism to process large data with extremely high efficiency. In this project we develop GPU based tools to address the statistical computation challenges in analyzing the next-generation sequencing data. Our work contains three components. First, we accelerate general statistical analysis in R, a generic environment for statistical computation, which is often limited to using Central Processing Unit (CPU) for computations. After studying various approaches of using GPU in R, we adopted the best solution to combine R with GPU. An R package is created to shift a set of critical R functions onto GPU computation. It allows users to run R code with GPU extensions that enable much faster large-data computation. Second, we address a set of specific computation-intensive problems in simulating genetic variants in whole-genome sequencing data. A GPU-based R package is created to facilitate some typical simulations in genetic association studies. Third, we break the CPU limitation of Variant Tools, a popular toolkit for the next-gen sequencing analysis, by extending its functionality to more the powerful parallel computation of GPU. For this purpose an R-function interface is created so that we can connect Variant Tools’ sophisticated data processing and annotation to the powerful GPU-accelerated data analysis. The work of this project is valuable to whole-genome sequencing studies, as well as to general statistical computational need. It is part of the research funded to the WPI Department of Mathematical Sciences by Major Research Instrumentation Program of National Science Foundation. The R packages and the interfacing code as well as their documentation will be available to view and download at users.wpi.edu/~zheyangwu/.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: