Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization
Depto. de Ingenieria y Ciencia de Computadores, Universidad Jaume I, 12071-Castellon, Spain
1st IEEE Int. Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms (RePara), 2015
@article{castello2015exploiting,
title={Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization},
author={Castell{‘o}, Adri{‘a}n and Mayo, Rafael and Planas, Judit and Quintana-Ort{i}, Enrique S},
year={2015}
}
OmpSs is a task-parallel programming model consisting of a reduced collection of OpenMP-like directives, a front-end compiler, and a runtime system. This directive-based programming interface helps developers accelerate their application’s execution, e.g. in a cluster equipped with graphics processing units (GPUs), with a low programming effort. On the other hand, the virtualization package rCUDA provides seamless and transparent remote access to any CUDA GPU in a cluster, via the CUDA Driver and Runtime programming interfaces. In this paper we investigate the hurdles and practical advantages of combining these two technologies. Our experimental study targets two cluster configurations: a system where all the GPUs are located into a single cluster node; and a cluster with the GPUs distributed among the nodes. Two applications, the Nbody particle simulation and the Cholesky factorization of a dense matrix, are employed to expose the bottlenecks and performance of a remote virtualization solution applied to these two OmpSs task-parallel codes.
October 8, 2015 by hgpu