A CPU-GPU Hybrid Runtime for the Aeminium Language

Alcides Fonseca
Departamento de Engenharia Informatica, Faculdade de Ciencias e Tecnologia, Universidade de Coimbra
Universidade de Coimbra, 2011


   title={A CPU-GPU Hybrid Runtime for the Aeminium Language},

   author={Fonseca, A.},



Download Download (PDF)   View View   Source Source   Source codes Source codes




Given that CPU clock speeds are stagnating, programmers are resorting to parallelism to improve the performance of their applications. Although such parallelism has usually been attained using either multicore architectures, multiple CPUs and/or clusters of machines, the GPU has since been used as an alternative. GPUs are an interesting resource because they can provide much more processing power at a fraction of the cost of CPUs. However, GPU programming is not an easy task. Developers that do not understand the programming model and the hardware architecture of a GPU will not be able to extract all of its processing potential. Furthermore, it is even harder to write code for the GPU that improves the performance compared to an optimized CPU version. This thesis proposes a high-level programming framework for parallel programs on both CPUs and GPUs. This approach, named AeminiumGPU, drives inspiration from Functional Programming and currently allows developers to implement programs based on the Map-Reduce pattern. In the future, the framework can be extended with other higher-order functions. AeminiumGPU does not force developers to understand the particularities of GPU programming. They write programs in pure Java (and soon Aeminium) and specific parts of that code are compiled to OpenCL and executed on the GPU. In order to generate code with good performance, AeminiumGPU performs special optimizations for the architecture of GPUs. For instance, it avoids unnecessary compilations and data transfers. Despite these optimizations, programs will not always run faster just by executing them on the GPU. It is possible that CPU code can evidence better performance than GPU versions. To handle such cases and to ensure the fastest version is always executed, AeminiumGPU automatically decides wether a particular operation should be executed on the GPU or the CPU. These decisions are based on code complexity and input data size, collected at compile-time and run-time. AeminiumGPU contributes to reducing the development time and effort required for writing GPU programs. The framework also increases the performance of Java and Aeminium code. The contributions of this thesis also include a cost model for reasoning about the fastest architecture for a given program block.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: