High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms
Center for Comprehensive Informatics, Emory University, Atlanta, GA
Emory University, Center for Comprehensive Informatics, Technical Report CCI-TR-2012-9, 2012
@article{teodoro2012high,
title={High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms},
author={Teodoro, G. and Pan, T. and Kurc, T.M. and Kong, J. and Cooper, L.A.D. and Podhorszki, N. and Klasky, S. and Saltz, J.H.},
year={2012}
}
Analysis of large pathology image datasets offers significant opportunities for biomedical researchers to investigate the morphology of disease, but the resource requirements of image analyses limit the scale of those studies. Motivated by such a study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we built runtime support that allows us to express our cancer image analysis application as a hierarchical pipeline in which the application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, and data prefetching and asynchronous data copy. These optimizations are employed to maximize utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. The results, obtained with the analysis application for study of brain tumors, show that cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (upto 1.6x) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attain better performance than coarser-grain, monolithic implementations used in other works. Moreover, the cancer image analysis pipeline was able to compute an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system.
January 6, 2013 by hgpu