high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Mehdi Amini

CRI – Centre de Recherche en Informatique

pastel-00958033, (11 March 2014)

@phdthesis{amini:pastel-00958033,

title={Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators},

author={Amini, Mehdi},

url={https://pastel.archives-ouvertes.fr/pastel-00958033},

number={2012ENMP0105},

school={Ecole Nationale Sup{‘e}rieure des Mines de Paris},

year={2014},

month={Dec},

keywords={GPU; CUDA; OpenCL; Parall{‘e}lisation automatis{‘e}e; Compilation; Automatic Parallelization},

type={Theses},

pdf={https://pastel.archives-ouvertes.fr/pastel-00958033/file/2012ENMP0105.pdf},

hal_id={pastel-00958033},

hal_version={v1}

}

Download (PDF)

View

Source

2351

views

Since the beginning of the 2000s, the raw performance of processors stopped its exponential increase. The modern graphic processing units (GPUs) have been designed as array of hundreds or thousands of compute units. The GPUs’ compute capacity quickly leads them to be diverted from their original target to be used as accelerators for general purpose computation. However programming a GPU efficiently to perform other computations than 3D rendering remains challenging.The current jungle in the hardware ecosystem is mirrored by the software world, with more and more programming models, new languages, different APIs, etc. But no one-fits-all solution has emerged.This thesis proposes a compiler-based solution to partially answer the three "P" properties: Performance, Portability, and Programmability. The goal is to transform automatically a sequential program into an equivalent program accelerated with a GPU. A prototype, Par4All, is implemented and validated with numerous experiences. The programmability and portability are enforced by definition, and the performance may not be as good as what can be obtained by an expert programmer, but still has been measured excellent for a wide range of kernels and applications.A survey of the GPU architectures and the trends in the languages and framework design is presented. The data movement between the host and the accelerator is managed without involving the developer. An algorithm is proposed to optimize the communication by sending data to the GPU as early as possible and keeping them on the GPU as long as they are not required by the host. Loop transformations techniques for kernel code generation are involved, and even well-known ones have to be adapted to match specific GPU constraints. They are combined in a coherent and flexible way and dynamically scheduled within the compilation process of an interprocedural compiler. Some preliminary work is presented about the extension of the approach toward multiple GPUs.

Tags: ATI, ATI Radeon HD 6970, Code generation, Computer science, CUDA, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce GTX 670, OpenCL, Tesla C1060, Tesla C2070, Thesis

August 24, 2015 by hgpu

No votes yet.

Please wait...