Accelerating Habanero-Java Programs with OpenCL Generation
Department of Computer Science, Rice University, Houston, TX, USA
10th International Conference on the Principles and Practice of Programming in Java (PPPJ), 2013
@inproceedings{hayashi2013accelerating,
title={Accelerating Habanero-Java programs with OpenCL generation},
author={Hayashi, Akihiro and Grossman, Max and Zhao, Jisheng and Shirako, Jun and Sarkar, Vivek},
booktitle={Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools},
pages={124–134},
year={2013},
organization={ACM}
}
The initial wave of programming models for general-purpose computing on GPUs, led by CUDA and OpenCL, has provided experts with low-level constructs to obtain significant performance and energy improvements on GPUs. However, these programming models are characterized by a challenging learning curve for non-experts due to their complex and low-level APIs. Looking to the future, improving the accessibility of GPUs and accelerators for mainstream software developers is crucial to bringing the benefits of these heterogeneous architectures to a broader set of application domains. A key challenge in doing so is that mainstream developers are accustomed to working with high-level managed languages, such as Java, rather than lower-level native languages such as C, CUDA, and OpenCL. The OpenCL standard enables portable execution of SIMD kernels across a wide range of platforms, including multi-core CPUs, many-core GPUs, and FPGAs. However, using OpenCL from Java to program multi-architecture systems is difficult and error-prone. Programmers are required to explicitly perform a number of lowlevel operations, such as (1) managing data transfers between the host system and the GPU, (2) writing kernels in the OpenCL kernel language, (3) compiling kernels & performing other OpenCL initialization, and (4) using the Java Native Interface (JNI) to access the C/C++ APIs for OpenCL. In this paper, we present compile-time and run-time techniques for accelerating programs written in Java using automatic generation of OpenCL as a foundation. Our approach includes (1) automatic generation of OpenCL kernels and JNI glue code from a parallel-for construct (forall) available in the Habanero-Java (HJ) language, (2) leveraging HJ’s array view language construct to efficiently support rectangular, multi-dimensional arrays on OpenCL devices, and (3) implementing HJ’s phaser (next) construct for all-to-all barrier synchronization in automatically generated OpenCL kernels. Our approach is named HJ-OpenCL. Contrasting with past approaches to generating CUDA or OpenCL from high-level languages, the HJ-OpenCL approach helps the programmer preserve Java exception semantics by generating both exception-safe and unsafe code regions. The execution of one or the other is selected at runtime based on the safe language construct introduced in this paper. We use a set of ten Java benchmarks to evaluate our approach, and observe performance improvements due to both native OpenCL execution and parallelism. On an AMD APU, our results show speedups of up to 36.7x relative to sequential Java when executing on the host 4-core CPU, and of up to 55.0x on the integrated GPU. For a system with an Intel Xeon CPU and a discrete NVIDIA Fermi GPU, the speedups relative to sequential Java are 35.7x for the 12-core CPU and 324.0x for the GPU. Further, we find that different applications perform optimally in JVM execution, in OpenCL CPU execution, and in OpenCL GPU execution. The language features, compiler extensions, and runtime extensions included in this work enable portability, rapid prototyping, and transparent execution of JVM applications across all OpenCL platforms.
September 22, 2013 by hgpu