high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

George Teodoro, Tony Pan, Tahsin M. Kurc, Jun Kong, Lee A. D. Cooper, Norbert Podhorszki, Scott Klasky, Joel H. Saltz

Center for Comprehensive Informatics, Emory University, Atlanta, GA

Emory University, Center for Comprehensive Informatics, Technical Report CCI-TR-2012-9, 2012

BibTeX

Download (PDF)

View

Source

1985

views

Analysis of large pathology image datasets offers significant opportunities for biomedical researchers to investigate the morphology of disease, but the resource requirements of image analyses limit the scale of those studies. Motivated by such a study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we built runtime support that allows us to express our cancer image analysis application as a hierarchical pipeline in which the application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, and data prefetching and asynchronous data copy. These optimizations are employed to maximize utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. The results, obtained with the analysis application for study of brain tumors, show that cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (upto 1.6x) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attain better performance than coarser-grain, monolithic implementations used in other works. Moreover, the cancer image analysis pipeline was able to compute an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system.

Tags: CUDA, GPU cluster, Image processing, Microscopy, nVidia, Prefetch, Tesla M2090

January 6, 2013 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

Share this:

Recent source codes

Most viewed papers (last 30 days)