Acceleration of bilateral filtering algorithm for manycore and multicore architectures
Georgia State University, Department of Computer Science, Atlanta, Georgia 30303
Georgia State University, 2012
@article{agarwal2012acceleration,
title={Acceleration of bilateral filtering algorithm for manycore and multicore architectures},
author={Agarwal, D. and Wilf, S. and Dhungel, A. and Prasad, S.K.},
year={2012}
}
This work explores multicore and manycore acceleration for the embarrassingly parallel, compute-intensive bilateral filtering kernel. For manycore architectures, we have created a pair-symmetric algorithm to avoid redundant calculations. For multicore architectures, we improve the algorithm by use of low level single instruction multiple data (SIMD) parallelism across multiple threads. We propose architecture specific optimizations, such as exploiting the unique capabilities of special registers available in modern multicore architectures and the rearrangement of data access patterns as per the computations to exploit special purpose instructions. We also propose optimizations pertinent to Nvidia’s CUDA, including utilization of CUDA’s implicit synchronization capability and the maximization of singleinstruction-multiple-thread efficiency. We present empirical data on the performance gains we achieved over variety of hardware architectures including Nvidia GTX280, AMD Barcelona, AMD Shanghai, Intel Harpertown, AMD Phenom, Intel Core i7 quad core, and Intel Nehalem 32 core machines. The best speedup achieved was (i) 235.5x speedup by our CUDA-based implementation of our pair-symmetric algorithm running on Nvidia’s GTX280 GPU and (ii) up to 38x using 16 cores of AMD Barcelona each with 4-stage vector pipeline compared to a compiler-optimized code.
July 2, 2012 by hgpu