Autotuning Wavefront Patterns for Heterogeneous Architectures

hgpu.org » Programming » Algorithms » Autotuning Wavefront Patterns for Heterogeneous Architectures

Autotuning Wavefront Patterns for Heterogeneous Architectures

Siddharth Mohanty

Institute of Computing Systems Architecture, School of Informatics, University of Edinburgh

University of Edinburgh, 2015

@article{mohanty2015autotuning,

title={Autotuning wavefront patterns for heterogeneous architectures},

author={Mohanty, Siddharth},

year={2015},

publisher={The University of Edinburgh}

}

Download (PDF)

View

Source

1445

views

Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizations are often not portable, and the whole process must be repeated when moving to a new system, or sometimes even to a different problem size. Pattern based parallel programming models were originally designed to provide programmers with an abstract layer, hiding tedious parallel boilerplate code, and allowing a focus on only application specific issues. However, the constrained algorithmic model associated with each pattern also enables the creation of pattern-specific optimization strategies. These can capture more complex variations than would be accessible by analysis of equivalent unstructured source code. These variations create complex optimization spaces. Machine learning offers well established techniques for exploring such spaces. In this thesis we use machine learning to create autotuning strategies for heterogeneous parallel implementations of applications which follow the wavefront pattern. In a wavefront, computation starts from one corner of the problem grid and proceeds diagonally like a wave to the opposite corner in either two or three dimensions. Our framework partitions and optimizes the work created by these applications across systems comprising multicore CPUs and multiple GPU accelerators. The tuning opportunities for a wavefront include controlling the amount of computation to be offloaded onto GPU accelerators, choosing the number of CPU and GPU threads to process tasks, tiling for both CPU and GPU memory structures, and trading redundant halo computation against communication for multiple GPUs. Our exhaustive search of the problem space shows that these parameters are very sensitive to the combination of architecture, wavefront instance and problem size. We design and investigate a family of autotuning strategies, targeting single and multiple CPU + GPU systems, and both two and three dimensional wavefront instances. These yield an average of 87% of the performance found by offline exhaustive search, with up to 99% in some cases.

Tags: Algorithms, ATI, ATI Radeon HD 7970, Computer science, Heterogeneous systems, Machine learning, nVidia, nVidia GeForce GTX 480, nVidia GeForce GTX 580, OpenCL, Performance, Thesis

September 19, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org