high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Image processing algorithm optimization with CUDA for Pure Data

Image processing algorithm optimization with CUDA for Pure Data

Rudi Giot, Abilio Rodrigues e Sousa

Laras – ISIB, 150, Rue Royale, Bruxelles, Belgique, 1000

4th international Pure Data Convention (PDCON), 2011

BibTeX

Download (PDF)

View

Source

2142

views

Image Processing Production lines featuring industrial vision are becoming more and more widespread. That kind of automation needs systems able to capture pictures, analyze and learn from them in order to take appropriate action. These processes are often heavy and applied to high-definition images with important frame rate. Powerful calculators are thus needed to follow the ever growing production rate. NVIDIA is currently designing interfaces providing a CUDA[1] allowing parallel data computation. This could increase the performance of every operating system using graphical processing units (GPU). A CUDA program is made up of two parts: one running on the host (CPU) and the other exploiting the device (GPU). The non-parallelizable stages of the program are run on the host, while the parallelizable ones are run on the device. Pure Data, thanks to its graphical modular development environment, allows fast prototype development. Those factors led us to start a research program dedicated to the realization of image processing modules for Pure Data written in CUDA. First, we will adapt the most often used algorithms (already existing within the GEM library). Our first results are encouraging. For instance, regarding RGB image conversion to grey scale image, tests demonstrate that GPU computing grants an average accelerating factor of 109 comparing to the "only-CPU" based computing. However a CPU + GPU architecture has a weakness regarding data transfers between the local memory and the graphics card. Most of the computation time (more than 90%) is spent on those transfers. There is indeed a double transfer between CPU and GPU for each CUDA function block in Pure Data. Considering this, performance is not optimal. We will thus spend some time in the future of the project to minimize those transfers. The idea is to have one first transfer, from CPU to GPU, at the start of the program and one second backward transfer at the end containing the result from the whole process. In conclusion, image processing algorithms by the graphics card is a really effective solution for complex processing. Integrating CUDA blocks inside Pure Data facilitates and accelerates the prototyping of applications. This would suit every field requiring a high frame rate, a high resolution, an important amount of operations or computation-greedy processes. It does include use for industry, medical or artistic purposes.

Tags: Algorithm optimization, Algorithms, CUDA, Image processing, nVidia, Optimization

December 24, 2011 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Image processing algorithm optimization with CUDA for Pure Data

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Image processing algorithm optimization with CUDA for Pure Data

Share this:

Recent source codes

Most viewed papers (last 30 days)