Transparent use of Java objects on the GPU in the JaMP/OpenMP framework

hgpu.org » Applications » Computer science » Transparent use of Java objects on the GPU in the JaMP/OpenMP framework

Transparent use of Java objects on the GPU in the JaMP/OpenMP framework

Carolin Wolf

Friedrich-Alexander-Universitat Erlangen-Nurnberg

Friedrich-Alexander-Universitat Erlangen-Nurnberg, 2013

BibTeX

Download (PDF)

View

Source

2268

views

Many computationally intensive applications profit by parallel execution, based on using multiple cores in CPUs, data-parallel GPGPU processing or even several machines like in clusters. However, changing a program to run in parallel requires a high effort and is therefore a time-consuming step during development. During the implementation, it is necessary to consider many steps that are not directly related to the computation problem. The required data must be accessible for every thread, programmers have to write code to start the computation threads and identify critical regions to synchronize them, and if they use a cluster, they are even obliged to take care about communication between the machines to distribute their work and data appropriately in the system. The Java/OpenMP (JaMP) project therefore deals with automated parallelization of data-parallel algorithms. Programmers merely mark for loops in the source code that should be executed in parallel with a special OpenMP parallel-for directive, and arrays with a special annotation are automatically distributed over the devices. JaMP takes over the other steps to obtain a parallel program, and it can exploit all resources in a cluster, simultaneously running on both GPUs and CPUs of multiple computers. In the past, the framework has already been developed to use primitive data types and arrays thereof. This thesis appends support for object-oriented programming. Execution with little framework overhead is achieved by creating at structs with the members of Java classes to avoid pointer serialization. Standard OpenMP shared objects are supported, that are replicated on every device during the parallel-for loop’s computation by JaMP, and the framework takes care about updates to these objects by automatically merging them before continuing the regular Java program after the loop. The JaMP feature that permits automatic distribution of arrays is maintained for object arrays, too. The resulting parallel program has a good speedup on several machines.

Tags: Computer science, CUDA, Java, nVidia, OpenMP, Tesla M1060, Thesis

February 12, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org