high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Biology » Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems

Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems

Robin Kobus

Department of Physics, Mathematics, and Computer Science, Johannes Gutenberg University, Mainz

Johannes Gutenberg University, Mainz, 2023

DOI:10.25358/openscience-9634

BibTeX

Download (PDF)

View

Source

Source codes

Package:

gossip: Efficient Communication Primitives for Multi-GPU Systems

1440

views

A wide range of bioinformatics applications have to deal with a continuously growing amount of data generated by high-throughput sequencing techniques. Exclusively CPU-based workstations fail to keep up with the task. Instead of employing dozens of CPU cluster nodes to increase the computational power, massively parallel accelerators like modern CUDA-enabled GPUs can be used to achieve higher throughput and reduce execution times. However, memory capacity of such devices is often limited. Efficient parallelization and data distribution are essential to accelerate performance critical components of bioinformatics pipelines like read classification and read mapping. In this thesis we analyze and optimize tasks common to many GPU-based applications in the context of bioinformatics. We study sequence processing, construction and querying of k-mer-based hash tables, segmented sort as well as multi-GPU communication. With these methods we accelerate suffix array construction and metagenomic read classification on CUDA-enabled GPUs by overcoming the aforementioned challenges. By leveraging multiple GPUs, we extend the limited memory available from a single GPU to allow for the construction of larger indices. Our communication library, called Gossip, introduces optimized scatter, gather and all-to-all patterns for multi-GPU systems. Gossip’s all-to-all communication pattern is successfully applied to suffix array construction, accelerating it to run in 3.44 s for a full-length human genome on an 8-GPU server, which is faster than previously reported 4.8 seconds achieved by employing 1600 cores on 100 nodes on a CPU-based HPC cluster. Furthermore, we introduce MetaCache-GPU — an ultra-fast metagenomic short read classifier specifically tailored to fit the characteristics of CUDA-enabled accelerators. Our approach employs a novel hash table variant featuring efficient minhash fingerprinting of reads for locality-sensitive hashing and their rapid insertion using warp-aggregated operations. Our performance evaluation shows that MetaCache-GPU is able to build large reference databases in a matter of seconds, enabling instantaneous operability, while popular CPU-based tools such as Kraken2 require over an hour for index construction on the same data. In the light of an ever-growing number of reference genomes, MetaCache-GPU is the first metagenomic classifier that makes analysis pipelines with on-demand composition of large-scale reference genome sets practical. Although many sub-problems in this thesis are optimized in a specific application context, they also apply to other bioinformatics problems like k-mer counting, sequence alignment and assembly, which would benefit from GPU acceleration. In addition to the insights from this work, we make our source code publicly available to allow for easier adaptation of our methods to related problems.

Tags: Bioinformatics, Biology, CUDA, Databases, Hashing, nVidia, nVidia Quadro GV100, Package, Sequence alignment, Tesla P100, Tesla V100, Thesis

November 27, 2023 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)