high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization

Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization

Roberto Pinto Souto, Carla Osthoff, Douglas Augusto, Oswaldo Trelles, Ana Tereza Ribeiro de Vasconcelos

National Laboratory for Scientific Computing Petropolis, Petropolis 25651-075, Brazil

Journal of Communication and Computer, 10, 1522-1528, 2013

BibTeX

Download (PDF)

View

Source

1841

views

High-density oligonucleotide microarrays allow several millions of genetic markers in a single experiment to be observed. Current bioinformatics tools for gene-expression quantile data normalization are unable to process such huge data sets. In parallel with this reality, the huge volume of molecular data produced by current high-throughput technologies in modern molecular biology has increased at a similar pace, challenging our capacity to process and understand data. On the other hand, the arrival of CUDA (compute unified device architecture) has unveiled the extraordinary power of GPUs (graphics processing units) to accelerate data intensive general purpose computing more and more as time goes by. In this work, we have evaluated the use of dynamic parallelism for ordering gene-expression data, where the management of kernels launching can be done not only by the host, but also by the device. Each sample has more than 6.5 million genes. We optimized the Quicksort parallel implementation available in the CUDA-5.5 Toolkit Samples and compared the performance of the sequential Quicksort algorithm from the GNU C Library (glibc) and with the parallel radix sort implementation available in the CUDPP-2.1 library. The Quicksort parallel implementation is designed to run on the GPU Kepler architecture, which supports dynamic parallelism. The results show that in the studied application the GPU parallel version with dynamic parallelism attains speed-ups in the data-sorting step. However, to achieve an effective overall speed-up considering the radix sort algorithm, performance of the whole application needs further optimizations.

Tags: Algorithms, Bioinformatics, Biology, CUDA, nVidia, Sorting, Tesla K20

May 29, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization

Share this:

Recent source codes

Most viewed papers (last 30 days)