Automatic run-time mapping of polyhedral computations to heterogeneous devices with memory-size restrictions

hgpu.org » Applications » Computer science » Automatic run-time mapping of polyhedral computations to heterogeneous devices with memory-size restrictions

Automatic run-time mapping of polyhedral computations to heterogeneous devices with memory-size restrictions

Yuri Torres, Arturo Gonzalez-Escribano, Diego R. Llanos

Departamento de Informatica, Edif. Tecn. de la Informacion, Universidad de Valladolid, Campus Miguel Delibes, 47011 Valladolid, Spain

The 19th International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’13), 2013

BibTeX

Download (PDF)

View

Source

2068

views

Tools that aim to automatically map parallel computations to heterogeneous and hierarchical systems try to divide the whole computation in parts with computational loads adjusted to the capabilities of the target devices. Some parts are executed in node cores, while others are executed in accelerator devices. Each part requires one or more data-structure pieces that should be allocated in the device memory during the computation. In this paper we present a model that allows such automatic mapping tools to transparently assign computations to heterogeneous devices with different memory size restrictions. The model requires the programmer to specify the access patterns of the computation threads in a simple abstract form. This information is used at run-time to determine the second-level partition of the computation assigned to a device, ensuring that the data pieces required by each sub-part fit in the target device memory, and that the number of kernels launched is minimal. We present experimental results with a prototype implementation of the model that works for regular polyhedral expressions. We show how it works for different example applications and access patterns, transparently executing big computations in devices with different memory size restrictions.

Tags: Computer science, CUDA, Heterogeneous systems, Memory model, nVidia, nVidia GeForce GTX 680

October 12, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org