7163

Skeleton-based Automatic Parallelization of Image Processing Algorithms for GPUs

Cedric Nugteren, Henk Corporaal, Bart Mesman
Eindhoven University of Technology, The Netherlands
International Conference on Embedded Computer Systems (SAMOS), 2011

@inproceedings{nugteren2011skeleton,

   title={Skeleton-based automatic parallelization of image processing algorithms for GPUs},

   author={Nugteren, C. and Corporaal, H. and Mesman, B.},

   booktitle={Embedded Computer Systems (SAMOS), 2011 International Conference on},

   pages={25–32},

   year={2011},

   organization={IEEE}

}

Download Download (PDF)   View View   Source Source   

849

views

Graphics Processing Units (GPUs) are becoming increasingly important in high performance computing. To maintain high quality solutions, programmers have to efficiently parallelize and map their algorithms. This task is far from trivial, leading to the necessity to automate this process. In this paper, we present a technique to automatically parallelize and map sequential code on a GPU, without the need for code-annotations. This technique is based on skeletonization and is targeted at image processing algorithms. Skeletonization separates the structure of a parallel computation from the algorithm’s functionality, enabling efficient implementations without requiring architecture knowledge from the programmer. We define a number of skeleton classes, each enabling GPU specific parallelization techniques and optimizations, including automatic thread creation, on-chip memory usage and memory coalescing. Recently, similar skeletonization techniques have been applied to GPUs. Our work uses domain specific skeletons and a finergrained classification of algorithms. Comparing skeleton-based parallelization to existing GPU code generators in general, we potentially achieve a higher hardware efficiency by enabling algorithm restructuring through skeletons. In a set of benchmarks, we show that the presented skeletonbased approach generates highly optimized code, achieving high data throughput. Additionally, we show that the automatically generated code performs close or equal to manually mapped and optimized code. We conclude that skeleton-based parallelization for GPUs is promising, but we do believe that future research must focus on the identification of a finer-grained and complete classification.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: