Anti-parallel Patterns in Fine-grain Data-parallel Programs

hgpu.org » Applications » Computer science » Anti-parallel Patterns in Fine-grain Data-parallel Programs

Anti-parallel Patterns in Fine-grain Data-parallel Programs

Jan Cornelis

Faculty of Science, Department of Computer Science and Applied Computer Science, Vrije Universiteit Brussel, Pleinlaan 2 B-1050 Brussels, Belgium

Vrije Universiteit Brussel, 2010

BibTeX

Download (PDF)

View

Source

2007

views

Parallel systems and parallel programming are becoming increasingly more important. The developer in want of raw speed can no longer expect sequential processors to become faster and needs to turn to parallel platforms and parallel programs to satisfy his need for speed. But writing a parallel program is difficult and writing one with a decent performance even more so. This thesis introduces the umbrella concept of "anti-parallel patterns" – parts of parallel programs that cause its performance to be less than expected. These patterns aim at helping developers understand, estimate and improve the performance of their parallel programs. To achieve these goals we model the effect of a given pattern on the performance and we supply solutions that can be applied in the face of an anti-parallel pattern. To help us model the behaviour of a pattern, we will define benchmark programs; programs that contain only one anti-parallel pattern. We will also discuss and test remedies that can be applied to decrease the performance loss the pattern causes. An additional advantage of the benchmark programs is that they can be used to compare the effect of a pattern on different parallel platforms. This work defines four anti-parallel patterns that are commonly found in fine-grain data-parallel programs such as those running on NVIDIA GPUs. For each anti-parallel pattern we give a definition and a thorough discussion of its behaviour on these GPUs. We present a number of benchmark programs – written using NVIDIA’s CUDA technology – in order to model the behaviour of the pattern. We also present remedies and examples of how to apply them. Finally, we demonstrate the usefulness of anti-parallel patterns by considering a prefix sum implementation in CUDA. We show how we can use the patterns to understand, estimate and improve the program’s performance.

Tags: Benchmarking, Computer science, CUDA, Data parallelism, nVidia, nVidia GeForce 8500 GT, nVidia GeForce GTX 280, Performance, Thesis

October 10, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org