high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Image processing algorithm optimization with CUDA for Pure Data

Image processing algorithm optimization with CUDA for Pure Data

Rudi Giot, Abilio Rodrigues e Sousa

Laras – ISIB, 150, Rue Royale, Bruxelles, Belgique, 1000

4th international Pure Data Convention (PDCON), 2011

@article{giot2011image,

title={Image processing algorithm optimization with CUDA for Pure Data},

author={Giot, I.R., Sousa, Abilio Rodrigues e},

year={2011}

}

Download (PDF)

View

Source

1744

views

Image Processing Production lines featuring industrial vision are becoming more and more widespread. That kind of automation needs systems able to capture pictures, analyze and learn from them in order to take appropriate action. These processes are often heavy and applied to high-definition images with important frame rate. Powerful calculators are thus needed to follow the ever growing production rate. NVIDIA is currently designing interfaces providing a CUDA[1] allowing parallel data computation. This could increase the performance of every operating system using graphical processing units (GPU). A CUDA program is made up of two parts: one running on the host (CPU) and the other exploiting the device (GPU). The non-parallelizable stages of the program are run on the host, while the parallelizable ones are run on the device. Pure Data, thanks to its graphical modular development environment, allows fast prototype development. Those factors led us to start a research program dedicated to the realization of image processing modules for Pure Data written in CUDA. First, we will adapt the most often used algorithms (already existing within the GEM library). Our first results are encouraging. For instance, regarding RGB image conversion to grey scale image, tests demonstrate that GPU computing grants an average accelerating factor of 109 comparing to the "only-CPU" based computing. However a CPU + GPU architecture has a weakness regarding data transfers between the local memory and the graphics card. Most of the computation time (more than 90%) is spent on those transfers. There is indeed a double transfer between CPU and GPU for each CUDA function block in Pure Data. Considering this, performance is not optimal. We will thus spend some time in the future of the project to minimize those transfers. The idea is to have one first transfer, from CPU to GPU, at the start of the program and one second backward transfer at the end containing the result from the whole process. In conclusion, image processing algorithms by the graphics card is a really effective solution for complex processing. Integrating CUDA blocks inside Pure Data facilitates and accelerates the prototyping of applications. This would suit every field requiring a high frame rate, a high resolution, an important amount of operations or computation-greedy processes. It does include use for industry, medical or artistic purposes.

Tags: Algorithm optimization, Algorithms, CUDA, Image processing, nVidia, Optimization

December 24, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Image processing algorithm optimization with CUDA for Pure Data

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Image processing algorithm optimization with CUDA for Pure Data

Share this:

Recent source codes

Most viewed papers (last 30 days)