Skeleton-based Automatic Parallelization of Image Processing Algorithms for GPUs

hgpu.org » Programming » Algorithms » Skeleton-based Automatic Parallelization of Image Processing Algorithms for GPUs

Skeleton-based Automatic Parallelization of Image Processing Algorithms for GPUs

Cedric Nugteren, Henk Corporaal, Bart Mesman

Eindhoven University of Technology, The Netherlands

International Conference on Embedded Computer Systems (SAMOS), 2011

DOI:10.1109/SAMOS.2011.6045441

@inproceedings{nugteren2011skeleton,

title={Skeleton-based automatic parallelization of image processing algorithms for GPUs},

author={Nugteren, C. and Corporaal, H. and Mesman, B.},

booktitle={Embedded Computer Systems (SAMOS), 2011 International Conference on},

pages={25–32},

year={2011},

organization={IEEE}

}

Download (PDF)

View

Source

2368

views

Graphics Processing Units (GPUs) are becoming increasingly important in high performance computing. To maintain high quality solutions, programmers have to efficiently parallelize and map their algorithms. This task is far from trivial, leading to the necessity to automate this process. In this paper, we present a technique to automatically parallelize and map sequential code on a GPU, without the need for code-annotations. This technique is based on skeletonization and is targeted at image processing algorithms. Skeletonization separates the structure of a parallel computation from the algorithm’s functionality, enabling efficient implementations without requiring architecture knowledge from the programmer. We define a number of skeleton classes, each enabling GPU specific parallelization techniques and optimizations, including automatic thread creation, on-chip memory usage and memory coalescing. Recently, similar skeletonization techniques have been applied to GPUs. Our work uses domain specific skeletons and a finergrained classification of algorithms. Comparing skeleton-based parallelization to existing GPU code generators in general, we potentially achieve a higher hardware efficiency by enabling algorithm restructuring through skeletons. In a set of benchmarks, we show that the presented skeletonbased approach generates highly optimized code, achieving high data throughput. Additionally, we show that the automatically generated code performs close or equal to manually mapped and optimized code. We conclude that skeleton-based parallelization for GPUs is promising, but we do believe that future research must focus on the identification of a finer-grained and complete classification.

Tags: Algorithms, Code generation, CUDA, Image processing, nVidia, nVidia GeForce GTX 470, Optimization

February 17, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org