Connecting Architecture, Fitness, Optimizations and Performance using an Anisotropic Diffusion Filter

Sumedh Naik
Clemson University
Clemson University, 2012


   title={Connecting Architecture, Fitness, Optimizations and Performance using an Anisotropic Diffusion Filter},

   author={Naik, Sumedh},


   school={Clemson University}


Download Download (PDF)   View View   Source Source   



Over the past decade, computing architectures have continued to exploit multiple levels of parallelism in applications. This increased interest in parallel computing has not only fueled the growth of multi-core processors but has also lead to an emergence of several non-traditional computing architectures like General Purpose Graphical Processing Units (GP-GPUs), Cell Processors, and Field Programmable Gate Arrays (FPGAs). Of these non-traditional computing architectures, GP-GPUs have gained widespread popularity due to their massively parallel computational abilities and relative ease of programmability. Several software development ecosystems have emerged to harness the power of these parallel architectures. Although several threading libraries like POSIX Threads, OpenMP and MPI are available for multi-core processors; the support for GP-GPUs remains limited to just two frameworks: Compute Unified Device Architecture (CUDA) and Open Compute Language (OpenCL). These threading libraries and frameworks each provide a powerful set of programming features that have a direct influence on the application performance. In this work, we characterize the behavior of an anisotropic diffusion filter and identify the hardware bottlenecks that limit the performance of the filter. We choose an image processing filtering algorithm for this study owing to its massively parallel nature. We then utilize a recently developed fitness model from the literature to predict the fitness of this algorithm for the selected architectures and identify the causes for its failure. We also report and analyze the variation of performance with problem size scaling, available optimization techniques, and execution configurations. We observe a best runtime of 3156 ms on the muti-core processors and 55.66 ms on the GP-GPUs. Our results and analysis highlight different architecture specific optimization techniques and identify the best match out of the selected architectures for this algorithm using a performance prediction model.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: