high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Acceleration of bilateral filtering algorithm for manycore and multicore architectures

Acceleration of bilateral filtering algorithm for manycore and multicore architectures

Dinesh Agarwal, Sami Wilf, Abinashi Dhungel, Sushil K. Prasad

Georgia State University, Department of Computer Science, Atlanta, Georgia 30303

Georgia State University, 2012

@article{agarwal2012acceleration,

title={Acceleration of bilateral filtering algorithm for manycore and multicore architectures},

author={Agarwal, D. and Wilf, S. and Dhungel, A. and Prasad, S.K.},

year={2012}

}

Download (PDF)

View

Source

Source codes

Package:

blfilter: Bilateral filtering kernel on multicores and manycores

2272

views

This work explores multicore and manycore acceleration for the embarrassingly parallel, compute-intensive bilateral filtering kernel. For manycore architectures, we have created a pair-symmetric algorithm to avoid redundant calculations. For multicore architectures, we improve the algorithm by use of low level single instruction multiple data (SIMD) parallelism across multiple threads. We propose architecture specific optimizations, such as exploiting the unique capabilities of special registers available in modern multicore architectures and the rearrangement of data access patterns as per the computations to exploit special purpose instructions. We also propose optimizations pertinent to Nvidia’s CUDA, including utilization of CUDA’s implicit synchronization capability and the maximization of singleinstruction-multiple-thread efficiency. We present empirical data on the performance gains we achieved over variety of hardware architectures including Nvidia GTX280, AMD Barcelona, AMD Shanghai, Intel Harpertown, AMD Phenom, Intel Core i7 quad core, and Intel Nehalem 32 core machines. The best speedup achieved was (i) 235.5x speedup by our CUDA-based implementation of our pair-symmetric algorithm running on Nvidia’s GTX280 GPU and (ii) up to 38x using 16 cores of AMD Barcelona each with 4-stage vector pipeline compared to a compiler-optimized code.

Tags: Algorithms, Computer science, CUDA, Filtering, nVidia, nVidia GeForce GTX 280, Optimization, Package

July 2, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Acceleration of bilateral filtering algorithm for manycore and multicore architectures

Package:

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Acceleration of bilateral filtering algorithm for manycore and multicore architectures

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)