Connecting Architecture, Fitness, Optimizations and Performance using an Anisotropic Diffusion Filter

hgpu.org » Applications » Computer science » Connecting Architecture, Fitness, Optimizations and Performance using an Anisotropic Diffusion Filter

Connecting Architecture, Fitness, Optimizations and Performance using an Anisotropic Diffusion Filter

Sumedh Naik

Clemson University

Clemson University, 2012

@phdthesis{naik2012connecting,

title={Connecting Architecture, Fitness, Optimizations and Performance using an Anisotropic Diffusion Filter},

author={Naik, Sumedh},

year={2012},

school={Clemson University}

}

Download (PDF)

View

Source

2151

views

Over the past decade, computing architectures have continued to exploit multiple levels of parallelism in applications. This increased interest in parallel computing has not only fueled the growth of multi-core processors but has also lead to an emergence of several non-traditional computing architectures like General Purpose Graphical Processing Units (GP-GPUs), Cell Processors, and Field Programmable Gate Arrays (FPGAs). Of these non-traditional computing architectures, GP-GPUs have gained widespread popularity due to their massively parallel computational abilities and relative ease of programmability. Several software development ecosystems have emerged to harness the power of these parallel architectures. Although several threading libraries like POSIX Threads, OpenMP and MPI are available for multi-core processors; the support for GP-GPUs remains limited to just two frameworks: Compute Unified Device Architecture (CUDA) and Open Compute Language (OpenCL). These threading libraries and frameworks each provide a powerful set of programming features that have a direct influence on the application performance. In this work, we characterize the behavior of an anisotropic diffusion filter and identify the hardware bottlenecks that limit the performance of the filter. We choose an image processing filtering algorithm for this study owing to its massively parallel nature. We then utilize a recently developed fitness model from the literature to predict the fitness of this algorithm for the selected architectures and identify the causes for its failure. We also report and analyze the variation of performance with problem size scaling, available optimization techniques, and execution configurations. We observe a best runtime of 3156 ms on the muti-core processors and 55.66 ms on the GP-GPUs. Our results and analysis highlight different architecture specific optimization techniques and identify the best match out of the selected architectures for this algorithm using a performance prediction model.

Tags: ATI Radeon HD 5870, Computer science, CUDA, Filtering, Image processing, MPI, nVidia, OpenCL, Tesla C2050, Thesis

April 22, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org