high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Mehdi Amini

CRI – Centre de Recherche en Informatique

pastel-00958033, (11 March 2014)

@phdthesis{amini:pastel-00958033,

title={Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators},

author={Amini, Mehdi},

url={https://pastel.archives-ouvertes.fr/pastel-00958033},

number={2012ENMP0105},

school={Ecole Nationale Sup{‘e}rieure des Mines de Paris},

year={2014},

month={Dec},

keywords={GPU; CUDA; OpenCL; Parall{‘e}lisation automatis{‘e}e; Compilation; Automatic Parallelization},

type={Theses},

pdf={https://pastel.archives-ouvertes.fr/pastel-00958033/file/2012ENMP0105.pdf},

hal_id={pastel-00958033},

hal_version={v1}

}

Download (PDF)

View

Source

1697

views

Since the beginning of the 2000s, the raw performance of processors stopped its exponential increase. The modern graphic processing units (GPUs) have been designed as array of hundreds or thousands of compute units. The GPUs’ compute capacity quickly leads them to be diverted from their original target to be used as accelerators for general purpose computation. However programming a GPU efficiently to perform other computations than 3D rendering remains challenging.The current jungle in the hardware ecosystem is mirrored by the software world, with more and more programming models, new languages, different APIs, etc. But no one-fits-all solution has emerged.This thesis proposes a compiler-based solution to partially answer the three "P" properties: Performance, Portability, and Programmability. The goal is to transform automatically a sequential program into an equivalent program accelerated with a GPU. A prototype, Par4All, is implemented and validated with numerous experiences. The programmability and portability are enforced by definition, and the performance may not be as good as what can be obtained by an expert programmer, but still has been measured excellent for a wide range of kernels and applications.A survey of the GPU architectures and the trends in the languages and framework design is presented. The data movement between the host and the accelerator is managed without involving the developer. An algorithm is proposed to optimize the communication by sending data to the GPU as early as possible and keeping them on the GPU as long as they are not required by the host. Loop transformations techniques for kernel code generation are involved, and even well-known ones have to be adapted to match specific GPU constraints. They are combined in a coherent and flexible way and dynamically scheduled within the compilation process of an interprocedural compiler. Some preliminary work is presented about the extension of the approach toward multiple GPUs.

Tags: ATI, ATI Radeon HD 6970, Code generation, Computer science, CUDA, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce GTX 670, OpenCL, Tesla C1060, Tesla C2070, Thesis

August 24, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Share this:

Recent source codes

Most viewed papers (last 30 days)