high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Parallelization of the Generalized Hough Transform on GPU

Parallelization of the Generalized Hough Transform on GPU

Juan Gomez-Luna, Jose Maria Gonzalez-Linares, Jose Ignacio Benavides, Emilio L. Zapata, Nicolas Guil

Computer Architecture and Electronics Department, University of Cordoba, Cordoba, Spain

XXII Jornadas de Paralelismo, 2011

@article{gomez2011parallelization,

title={Parallelization of the Generalized Hough Transform on GPU},

author={G{‘o}mez-Luna1a, J. and Gonz{‘a}lez-Linaresb, J.M. and Benavidesa, J.I. and Zapatab, E.L. and Guilb, N.},

year={2011}

}

Download (PDF)

View

Source

2613

views

Programs developed under the Compute Unified Device Architecture (CUDA) obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among threads and a high value of processor occupancy, i.e. the ratio of active threads, are indispensable. However, in certain applications, an optimally balanced implementation may limit the occupancy, due to a greater need of registers and shared memory. This is the case of the Fast Generalized Hough Transform (Fast GHT), an image processing technique for localizing an object within an image. In this work, we present two parallelization alternatives for the Fast GHT, one that optimizes the load balancing and another that maximizes the occupancy. We have compared them using a large amount of real images to test their strong and weak points and we have drawn several conclusions about under which conditions it is better to use one or another. We have also tackled several parallelization problems related to sparse data distribution, divergent execution paths and irregular memory access patterns in updating operations by proposing a set of generic techniques as compacting, sorting and memory storage replication.

Tags: CUDA, Image processing, nVidia, nVidia GeForce GTX 280, Sorting

November 3, 2011 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Parallelization of the Generalized Hough Transform on GPU

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Parallelization of the Generalized Hough Transform on GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)