high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Compiling and Optimizing Java 8 Programs for GPU Execution

Compiling and Optimizing Java 8 Programs for GPU Execution

Kazuaki Ishizaki, Akihiro Hayashi, Gita Koblents, Vivek Sarkar

IBM Research, Tokyo

24th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2015

@inproceedings{ishizaki2015compiling,

title={Compiling and Optimizing Java 8 Programs for GPU Execution},

author={Ishizaki, Kazuaki and Hayashi, Akihiro and Koblents, Gita and Sarkar, Vivek},

booktitle={24th International Conference on Parallel Architectures and Compilation Techniques (PACT)},

year={2015}

}

Download (PDF)

View

Source

2173

views

GPUs can enable significant performance improvements for certain classes of data parallel applications and are widely used in recent computer systems. However, GPU execution currently requires explicit low-level operations such as 1) managing memory allocations and transfers between the host system and the GPU, 2) writing GPU kernels in a low-level programming model such as CUDA or OpenCL, and 3) optimizing the kernels by utilizing appropriate memory types on the GPU. Because of this complexity, in many cases, only expert programmers can exploit the computational capabilities of GPUs through the CUDA/OpenCL languages. This is unfortunate since a large number of programmers use high-level languages, such as Java, due to their advantages of productivity, safety, and platform portability, but would still like to exploit the performance benefits of GPUs. Thus, one challenging problem is how to utilize GPUs while allowing programmers to continue to benefit from the productivity advantages of languages like Java. This paper presents a just-in-time (JIT) compiler that can generate and optimize GPU code from a pure Java program written using lambda expressions with the new parallel streams APIs in Java 8. These APIs allow Java programmers to express data parallelism at a higher level than threads and tasks. Our approach translates lambda expressions with parallel streams APIs in Java 8 into GPU code and automatically generates runtime calls that handle the low-level operations mentioned above. Additionally, our optimization techniques 1) allocate and align the starting address of the Java array body in the GPUs with the memory transaction boundary to increase memory bandwidth, 2) utilize read-only cache for array accesses to increase memory efficiency in GPUs, and 3) eliminate redundant data transfer between the host and the GPU. The compiler also performs loop versioning for eliminating redundant exception checks and for supporting virtual method invocations within GPU kernels. These features and optimizations are supported and automatically performed by a JIT compiler that is built on top of a production version of the IBM Java 8 runtime environment. Our experimental results on an NVIDIA Tesla GPU show significant performance improvements over sequential execution (127.9x geometric mean) and parallel execution (3.3x geometric mean) for eight Java 8 benchmark programs running on a 160-thread POWER8 machine. This paper also includes an in-depth analysis of GPU execution to show the impact of our optimization techniques by selectively disabling each optimization. Our experimental results show a geometricmean speed-up of 1.15x in the GPU kernel over state-of-theart approaches. Overall, our JIT compiler can improve the performance of Java 8 programs by automatically leveraging the computational capability of GPUs.

Tags: Code generation, Compilers, Computer science, CUDA, Java, nVidia, OpenCL, Tesla K40

October 27, 2015 by ahayashi

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

Compiling and Optimizing Java 8 Programs for GPU Execution

Your response

Recent source codes

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Most viewed papers (last 30 days)

Compiling and Optimizing Java 8 Programs for GPU Execution

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)