19524

Hardware/Software Co-Design for Data-Intensive Genomics Workloads

Nicola Cadenelli
Universitat Politècnica de Catalunya – Barcelona
Universitat Politècnica de Catalunya, 2019

@article{cadenelli2019hardware,

   title={Hardware/software co-design for data-intensive genomics workloads},

   author={Cadenelli, Luca},

   year={2019},

   publisher={Universitat Polit{`e}cnica de Catalunya}

}

Download Download (PDF)   View View   Source Source   

1306

views

Since the last decade, the main components of computer systems have been evolving, diversifying, to overcome their physical limits and to minimize their energy footprint. Hardware specialization and heterogeneity have become key to design more efficient systems and tackle ever-important problems with ever-larger volumes of data. However, to fully take advantage of the new hardware, a tighter integration between hardware and software, called hardware/software co-design, is also needed. Hardware/software co-design is a time-consuming process that poses its challenges, such as code and performance portability. Despite its challenges and considerable costs, it is an effort that is crucial for data-intensive applications that run at scale. Such applications span across different fields, such as engineering, chemistry, life sciences, astronomy, high energy physics, earth sciences, et cetera. Another scientific field where hardware/software co-design is fundamental is genomics. Here, modern DNA sequencing technologies reduced the sequencing time and made its cost orders of magnitude cheaper than it was just a few years ago. This breakthrough, together with novel genomics methods, will eventually enable the long-awaited personalized medicine. Personalized medicine selects appropriate and optimal therapies based on the context of a patient’s genome, and it has the potential to change medical treatments as we know them today. However, the broad adoption of genomics methods is limited by their capital and operational costs. In fact, genomics pipelines consist of complex algorithms with execution times of many hours per each patient and vast intermediate data structures stored in main memory for good performance. To satisfy the main memory requirement genomics applications are usually scaled-out to multiple compute nodes. Therefore, these workloads require infrastructures of enterprise-class servers, with entry and running costs that that most labs, clinics, and hospitals cannot afford. Due to these reasons, co-designing genomics workloads to lower their total cost of ownership is essential and worth investigating. This thesis demonstrates that hardware/software co-design allows migrating data-intensive genomics applications to inexpensive desktop-class machines to reduce the total cost of ownership when compared to traditional cluster deployments. Firstly, the thesis examines algorithmic improvements to ease co-design and to reduce workload footprint, using NVMs as a memory extension, and so to be able to run in one single node. Secondly, it investigates how data-intensive algorithms can offload computation to programmable accelerators (i.e., GPUs and FPGAs) to reduce the execution time and the energy-to-solution. Thirdly, it explores and proposes techniques to substantially reduce the memory footprint through the adoption of flash memory to the point that genomics methods can run on one affordable desktop-class machine. To demonstrate this thesis, we do the exercise to co-design SMUFIN, a state-of-the-art realworld genomics method that was originally deployed on 16 nodes MareNostrum 3, where, per each patient, it needed around 10 hours and 56 kWh to complete its execution. Thanks to algorithmic improvements, an NVM used as main memory extension, and accelerators, we made it possible to execute SMUFIN on one single enterprise-node with 512 GB of main memory in 9 hours and as few as 4.3 kWh, a 13.1x improvement. However, we were able to run SMUFIN on a desktop-class machine only thanks to the adoption of NVMe as an alternative to main memory. In this affordable node with a 6-core i7 and only 32 GB of main memory, SMUFIN suffers a considerable slow-down, requiring 22.4 hours, but it consumes only 2.4 KWh, a 23.3x improvement compared to the original deployment. Compared to the single enterprise-node, this desktop machine costs only 1/4 as much, and requires only approximately 1/2 of energy per patient. As a result, a cluster of multiple desktop-class machines costs half as much compared to a cluster of servers and consumes half as much energy while maintaining a similar throughput. These results prove that hardware/software co-design allows significant reductions in the total cost of ownership of data-intensive genomics methods, easing their adoption on large repositories of genomes and also on the field.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: