Automatic run-time mapping of polyhedral computations to heterogeneous devices with memory-size restrictions
Departamento de Informatica, Edif. Tecn. de la Informacion, Universidad de Valladolid, Campus Miguel Delibes, 47011 Valladolid, Spain
The 19th International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’13), 2013
@article{torres2013automatic,
title={Automatic run-time mapping of polyhedral computations to heterogeneous devices with memory-size restrictions},
author={Torres, Yuri and Gonzalez-Escribano, Arturo and Llanos, Diego R.},
year={2013}
}
Tools that aim to automatically map parallel computations to heterogeneous and hierarchical systems try to divide the whole computation in parts with computational loads adjusted to the capabilities of the target devices. Some parts are executed in node cores, while others are executed in accelerator devices. Each part requires one or more data-structure pieces that should be allocated in the device memory during the computation. In this paper we present a model that allows such automatic mapping tools to transparently assign computations to heterogeneous devices with different memory size restrictions. The model requires the programmer to specify the access patterns of the computation threads in a simple abstract form. This information is used at run-time to determine the second-level partition of the computation assigned to a device, ensuring that the data pieces required by each sub-part fit in the target device memory, and that the number of kernels launched is minimal. We present experimental results with a prototype implementation of the model that works for regular polyhedral expressions. We show how it works for different example applications and access patterns, transparently executing big computations in devices with different memory size restrictions.
October 12, 2013 by hgpu