8225

Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

George Teodoro, Tony Pan, Tahsin Kurc, Jun Kong, Lee Cooper, Joel Saltz
Center for Comprehensive Informatics and Biomedical Informatics Department, Emory University, Atlanta, GA 30322
arXiv:1209.3314 [cs.DC] (14 Sep 2012)

@article{2012arXiv1209.3314T,

   author={Teodoro, George and Pan, Tony and Kurc, Tahsin and Kong, Jun and Cooper, Lee and Saltz, Joel},

   title={Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1209.3314},

   primaryClass={"cs.DC"},

   keywords={Distributed, Parallel, and Cluster Computing; Systems and Control},

   year={2012},

   month={sep}

}

Download Download (PDF)   View View   Source Source   

2236

views

In this paper, we address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50x and 85x with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: