Object support for OpenMP-style programming of GPU clusters in Java
University of Erlangen-Nuremberg, Computer Science Department, Programming Systems Group, Erlangen, Germany
27th International Conference on Advanced Information Networking and Applications Workshops (WAINA 2013), 2013
@inproceedings{Pub.2013.tech.IMMD.IMMD2.object,
author={"{C}arolin{W}olfand{G}eorg{D}otzlerand{R}onald{V}eldemaand{M}ichael{P}hilippsen"},
title={"{O}bject{S}upportfor{O}pen{M}{P}-style{P}rogrammingof{G}{P}{U}{C}lustersin{J}ava"},
booktitle={"{P}roceedingsofthe27th{I}nternational{C}onferenceon{A}dvanced{I}nformation{N}etworkingand{A}pplications{W}orkshops({W}{A}{I}{N}{A}2013)"},
year={2013},
editor={"{I}{E}{E}{E}{C}omputer{S}ociety"},
location={"{B}arcelona},
pages={"1405–1410"},
isbn={"978-1-4673-6239-9"}
}
For scientists, it is advantageous to use a high level of abstraction for programming their simulations, so that they can focus on the problem at hand instead of struggling with low-level details. However, current HPC clusters with multiple GPUs per node only offer explicit communication to and from the GPUs, require manual work to keep the data consistent, and often need explicit kernel programming. Moreover, known GPU programming frameworks are limited to a single GPU or a single machine and also rarely support objects. Our system removes the above restrictions. With a slight but necessary change in Java’s semantics, we achieve automatic distribution and efficient use of objects and arrays of objects on multiple GPUs in a cluster. On benchmarks that distribute arrays of objects over five machines with 10 GPUs, we achieve speedups of up to 4.9 compared to one node.
July 16, 2013 by hgpu