high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Efficient Mapping of Streaming Applications for Image Processing on Graphics Cards

Efficient Mapping of Streaming Applications for Image Processing on Graphics Cards

Richard Membarth, Hritam Dutta, Frank Hannig, Jurgen Teich

Hardware/Software Co-Design, Department of Computer Science, University of Erlangen-Nuremberg, Germany

Transactions on High-Performance Embedded Architectures and Compilers (Transactions on HiPEAC), 5(3) 2011

BibTeX

Download (PDF)

View

Source

Source codes

Package:

GIMP plugin (CUDA Challenge)

1955

views

In the last decade, there has been a dramatic growth in research and development of massively parallel commodity graphics hardware both in academia and industry. Graphics card architectures provide an optimal platform for parallel execution of many number crunching loop programs from fields like image processing or linear algebra. However, it is hard to efficiently map such algorithms to the graphics hardware even with detailed insight into the architecture. This paper presents a multiresolution image processing algorithm and shows the efficient mapping of this type of algorithms to graphics hardware as well as double buffering concepts to hide memory transfers. Furthermore, the impact of execution configuration is illustrated and a method is proposed to determine offline the best configuration. Using CUDA as programming model, it is demonstrated that the image processing algorithm is significantly accelerated and that a speedup of more than 145x can be achieved on NVIDIA’s Tesla C1060 compared to a parallelized implementation on a Xeon Quad Core. For deployment in a streaming application with steadily new incoming data, it is shown that the memory transfer overhead to the graphics card is reduced by a factor of six using double buffering.

Tags: Algorithms, CUDA, Image processing, nVidia, Package, Tesla C1060

October 15, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Efficient Mapping of Streaming Applications for Image Processing on Graphics Cards

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Efficient Mapping of Streaming Applications for Image Processing on Graphics Cards

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)