Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy
23rd International Conference on Architecture of Computing Systems (ARCS), 2010
@article{biagio2010improved,
title={Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring},
author={Biagio, A.D. and Agosta, G.},
journal={ARCS 2010},
year={2010},
publisher={VDE VERLAG GmbH}
}
The programmability of recent graphic processing unit (GPU) architectures has been the main factor driving the dramatic increase in interest for this class of architectures as low-cost accelerators for a wide range of high-performance applications. Current GPU programming models, such as OpenCL and CUDA, still expose too many architectural features, such as the memory hierarchy, to the programmer. We propose to raise the abstraction level of code by mapping some constructs of the well-known OpenMP parallel programmingmodel onto the dominant CUDA GPU programming model. To this end, we are studying solutions for two main issues: the automated allocation of data on the GPU device memory hierarchy, and the translation of OpenMP parallel loops to CUDA kernels. We report some initial experimental results showing that the transformations are indeed promising.
June 21, 2011  by hgpu
Your response
You must be logged in to post a comment.


