Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization
National Laboratory for Scientific Computing Petropolis, Petropolis 25651-075, Brazil
Journal of Communication and Computer, 10, 1522-1528, 2013
@article{souto2013performance,
title={Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization},
author={Souto, Roberto Pinto and Osthoff, Carla and Augusto, Douglas and Trelles, Oswaldo and Vasconcelos, Ana Tereza Ribeiro de},
year={2013}
}
High-density oligonucleotide microarrays allow several millions of genetic markers in a single experiment to be observed. Current bioinformatics tools for gene-expression quantile data normalization are unable to process such huge data sets. In parallel with this reality, the huge volume of molecular data produced by current high-throughput technologies in modern molecular biology has increased at a similar pace, challenging our capacity to process and understand data. On the other hand, the arrival of CUDA (compute unified device architecture) has unveiled the extraordinary power of GPUs (graphics processing units) to accelerate data intensive general purpose computing more and more as time goes by. In this work, we have evaluated the use of dynamic parallelism for ordering gene-expression data, where the management of kernels launching can be done not only by the host, but also by the device. Each sample has more than 6.5 million genes. We optimized the Quicksort parallel implementation available in the CUDA-5.5 Toolkit Samples and compared the performance of the sequential Quicksort algorithm from the GNU C Library (glibc) and with the parallel radix sort implementation available in the CUDPP-2.1 library. The Quicksort parallel implementation is designed to run on the GPU Kepler architecture, which supports dynamic parallelism. The results show that in the studied application the GPU parallel version with dynamic parallelism attains speed-ups in the data-sorting step. However, to achieve an effective overall speed-up considering the radix sort algorithm, performance of the whole application needs further optimizations.
May 29, 2014 by hgpu