high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Investigating performance variations of an optimized GPU-ported granulometry algorithm

Investigating performance variations of an optimized GPU-ported granulometry algorithm

Vincent Boulos, Vincent Fristot, Dominique Houzet, Luc Salvo, Pierre Lhuissier

GIPSA-lab, UMR5216 CNRS/INPG/UJF/U.Stendhal, F-38402 GRENOBLE CEDEX, France

Conference on Design and Architectures for Signal and Image Processing (DASIP), 2012

@inproceedings{boulos2012investigating,

title={Investigating performance variations of an optimized GPU-ported granulometry algorithm},

author={Boulos, Vincent and Fristot, Vincent and Houzet, Dominique and Salvo, Luc and Lhuissier, Pierre},

booktitle={Design and Architectures for Signal and Image Processing (DASIP), 2012 Conference on},

pages={1–6},

year={2012},

organization={IEEE}

}

Download (PDF)

View

Source

2905

views

In this article, we present an optimized GPU implementation of a granulometry algorithm which is used a lot in the study of material domain. The main contribution to this algorithm is the binarization of the input data which increases throughput while reducing data allocated memory space. Also, the optimized GPU implementation brings an order of magnitude speedup compared to a CPU multi-threaded implementation. Furthermore, we investigate the reasons why GPU performance drop for different input data dimensions. Three main factors are exposed: under-exploited threads, threadblocks and streaming multiprocessors. This study should help the reader understand the tight relation that exists between the CUDA programming paradigm and the gpu architecture as well as some main bottlenecks.

Tags: Algorithm optimization, Algorithms, CUDA, FEM, Finite element method, Image processing, Materials Science, nVidia, nVidia GeForce GTX 285, nVidia GeForce GTX 480, nVidia Quadro FX 4000

February 22, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Investigating performance variations of an optimized GPU-ported granulometry algorithm

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Investigating performance variations of an optimized GPU-ported granulometry algorithm

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)