Anti-parallel Patterns in Fine-grain Data-parallel Programs
Faculty of Science, Department of Computer Science and Applied Computer Science, Vrije Universiteit Brussel, Pleinlaan 2 B-1050 Brussels, Belgium
Vrije Universiteit Brussel, 2010
@phdthesis{cornelis2010anti,
title={Anti-parallel Patterns in Fine-grain Data-parallel Programs},
author={Cornelis, J.},
year={2010},
school={Master’s thesis, Vrije Universiteit Brussel, Belgium}
}
Parallel systems and parallel programming are becoming increasingly more important. The developer in want of raw speed can no longer expect sequential processors to become faster and needs to turn to parallel platforms and parallel programs to satisfy his need for speed. But writing a parallel program is difficult and writing one with a decent performance even more so. This thesis introduces the umbrella concept of "anti-parallel patterns" – parts of parallel programs that cause its performance to be less than expected. These patterns aim at helping developers understand, estimate and improve the performance of their parallel programs. To achieve these goals we model the effect of a given pattern on the performance and we supply solutions that can be applied in the face of an anti-parallel pattern. To help us model the behaviour of a pattern, we will define benchmark programs; programs that contain only one anti-parallel pattern. We will also discuss and test remedies that can be applied to decrease the performance loss the pattern causes. An additional advantage of the benchmark programs is that they can be used to compare the effect of a pattern on different parallel platforms. This work defines four anti-parallel patterns that are commonly found in fine-grain data-parallel programs such as those running on NVIDIA GPUs. For each anti-parallel pattern we give a definition and a thorough discussion of its behaviour on these GPUs. We present a number of benchmark programs – written using NVIDIA’s CUDA technology – in order to model the behaviour of the pattern. We also present remedies and examples of how to apply them. Finally, we demonstrate the usefulness of anti-parallel patterns by considering a prefix sum implementation in CUDA. We show how we can use the patterns to understand, estimate and improve the program’s performance.
October 10, 2011 by hgpu