Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

hgpu.org » Applications » Computer science » Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

Juan Gomez-Luna, Jose Maria Gonzalez-Linares, Jose Ignacio Benavides, Emilio L. Zapata, Nicolas Guil

Computer Architecture and Electronics Department, University of Cordoba, Cordoba, Spain

International Journal of High Performance Computing Applications, May 2011, vol. 25, no. 2, 205-222

DOI:10.1177/1094342010383998

@article{gomez2011load,

title={Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study},

author={G{‘o}mez-Luna, J. and Gonz{‘a}lez-Linares, J.M. and Ignacio Benavides, J. and Zapata, E.L. and Guil, N.},

journal={International Journal of High Performance Computing Applications},

volume={25},

number={2},

pages={205},

year={2011},

publisher={SAGE Publications}

}

Source

2341

views

Programs developed under the Compute Unified Device Architecture obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among threads and a high value of processor occupancy, i.e. the ratio of active threads, are indispensable. However, in certain applications, an optimally balanced implementation may limit the occupancy, due to a greater need for registers and shared memory. This is the case of the Fast Generalized Hough Transform (Fast GHT), an image-processing technique for localizing an object within an image. In this work, we present two parallelization alternatives for the Fast GHT, one that optimizes the load balancing and another that maximizes the occupancy. We have compared them using a large amount of real images to test their strong and weak points and we have drawn several conclusions about under which conditions it is better to use one or the other. We have also tackled several parallelization problems related to sparse data distribution, divergent execution paths, and irregular memory access patterns in updating operations by proposing a set of generic techniques, including compacting, sorting, and memory storage replication. Finally, we have compared our Fast GHT with the classic GHT, both on a current GPU, obtaining an important speed-up.

Tags: Computer science, CUDA, Image processing, nVidia, Programming techniques

August 22, 2011 by hgpu

Rating: 2.3/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org