high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Hardware/Software Co-Design for Data-Intensive Genomics Workloads

Hardware/Software Co-Design for Data-Intensive Genomics Workloads

Nicola Cadenelli

Universitat Politècnica de Catalunya – Barcelona

Universitat Politècnica de Catalunya, 2019

BibTeX

Download (PDF)

View

Source

1562

views

Since the last decade, the main components of computer systems have been evolving, diversifying, to overcome their physical limits and to minimize their energy footprint. Hardware specialization and heterogeneity have become key to design more efficient systems and tackle ever-important problems with ever-larger volumes of data. However, to fully take advantage of the new hardware, a tighter integration between hardware and software, called hardware/software co-design, is also needed. Hardware/software co-design is a time-consuming process that poses its challenges, such as code and performance portability. Despite its challenges and considerable costs, it is an effort that is crucial for data-intensive applications that run at scale. Such applications span across different fields, such as engineering, chemistry, life sciences, astronomy, high energy physics, earth sciences, et cetera. Another scientific field where hardware/software co-design is fundamental is genomics. Here, modern DNA sequencing technologies reduced the sequencing time and made its cost orders of magnitude cheaper than it was just a few years ago. This breakthrough, together with novel genomics methods, will eventually enable the long-awaited personalized medicine. Personalized medicine selects appropriate and optimal therapies based on the context of a patient’s genome, and it has the potential to change medical treatments as we know them today. However, the broad adoption of genomics methods is limited by their capital and operational costs. In fact, genomics pipelines consist of complex algorithms with execution times of many hours per each patient and vast intermediate data structures stored in main memory for good performance. To satisfy the main memory requirement genomics applications are usually scaled-out to multiple compute nodes. Therefore, these workloads require infrastructures of enterprise-class servers, with entry and running costs that that most labs, clinics, and hospitals cannot afford. Due to these reasons, co-designing genomics workloads to lower their total cost of ownership is essential and worth investigating. This thesis demonstrates that hardware/software co-design allows migrating data-intensive genomics applications to inexpensive desktop-class machines to reduce the total cost of ownership when compared to traditional cluster deployments. Firstly, the thesis examines algorithmic improvements to ease co-design and to reduce workload footprint, using NVMs as a memory extension, and so to be able to run in one single node. Secondly, it investigates how data-intensive algorithms can offload computation to programmable accelerators (i.e., GPUs and FPGAs) to reduce the execution time and the energy-to-solution. Thirdly, it explores and proposes techniques to substantially reduce the memory footprint through the adoption of flash memory to the point that genomics methods can run on one affordable desktop-class machine. To demonstrate this thesis, we do the exercise to co-design SMUFIN, a state-of-the-art realworld genomics method that was originally deployed on 16 nodes MareNostrum 3, where, per each patient, it needed around 10 hours and 56 kWh to complete its execution. Thanks to algorithmic improvements, an NVM used as main memory extension, and accelerators, we made it possible to execute SMUFIN on one single enterprise-node with 512 GB of main memory in 9 hours and as few as 4.3 kWh, a 13.1x improvement. However, we were able to run SMUFIN on a desktop-class machine only thanks to the adoption of NVMe as an alternative to main memory. In this affordable node with a 6-core i7 and only 32 GB of main memory, SMUFIN suffers a considerable slow-down, requiring 22.4 hours, but it consumes only 2.4 KWh, a 23.3x improvement compared to the original deployment. Compared to the single enterprise-node, this desktop machine costs only 1/4 as much, and requires only approximately 1/2 of energy per patient. As a result, a cluster of multiple desktop-class machines costs half as much compared to a cluster of servers and consumes half as much energy while maintaining a similar throughput. These results prove that hardware/software co-design allows significant reductions in the total cost of ownership of data-intensive genomics methods, easing their adoption on large repositories of genomes and also on the field.

Tags: Algorithms, Computer science, FPGA, nVidia, OpenCL, performance portability, Tesla K40, Thesis

January 26, 2020 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Hardware/Software Co-Design for Data-Intensive Genomics Workloads

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Hardware/Software Co-Design for Data-Intensive Genomics Workloads

Share this:

Recent source codes

Most viewed papers (last 30 days)