Code Generation Compiler for the OpenMP 4.0 Accelerator Model onto OMPSS
Universitat Politecnica de Catalunya (UPC-Barcelona TECH)
Universitat Politecnica de Catalunya, 2014
@article{ozen2014code,
title={Code generation for the openmp 4.0 accelerator model onto ompss},
author={Ozen, Guray},
publisher={Universitat Polit{‘e}cnica de Catalunya},
year={2014}
}
The aim of OpenMP which is a well known shared memory programming API, is using shared memory multiprocessor programming with pragma directives easily. Up till now, its interface consisted of task and iteration level parallelism for general purpose CPU. However OpenMP includes in its latest 4.0 specification the accelerator model. OmpSs is an OpenMP extended parallel programming model developed at the Barcelona Supercomputing Center and it have already supported accelerators without code generation. The main objective of OmpSs is to orchestrate different kind of tasks. The design of OmpSs is highly biassed to delegate most of the decisions to the runtime system, which based on the task graph built at runtime (depend clauses) is able to schedule tasks in a data flow way to the available processors and accelerator devices and orchestrate data transfers and reuse among multiple address spaces. In this thesis i present a MACC compiler which is partial implementation of this specification in the OmpSs programming model with the aim of identifying which should be the roles of the programmer, the compiler and the runtime system in order to facilitate the asynchronous execution of tasks in architectures with multiple accelerator devices and processors. For this reason implementation of thesis is partial, just considering from 4.0 those directives that enable the compiler the generation of the so called "kernels" to be executed on the target device. Several extensions to the current specification are also presented, such as the specification of tasks in "native" CUDA and OpenCL or how to specify the device and data privatization in the target construct. Finally, the paper also discusses some challenges found in code generation and a preliminary performance evaluation with some kernel applications. Before starting issues about GPU code generation, PURE GPU programming models are discussed and are showed their pears and pitfalls. Besides all these, challenges of GPU programming are discussed such as memory usage, multi gpu and streaming. Based upon these experience, some methods are developed and they are applied ontoMACC compiler. Basically MACC’s extension from OpenMP 4.0 is based on these methods.
November 29, 2014 by hgpu