high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization

Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization

Adrian Castello, Rafael Mayo, Judit Planas, Enrique S. Quintana-Orti

Depto. de Ingenieria y Ciencia de Computadores, Universidad Jaume I, 12071-Castellon, Spain

1st IEEE Int. Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms (RePara), 2015

BibTeX

Download (PDF)

View

Source

1657

views

OmpSs is a task-parallel programming model consisting of a reduced collection of OpenMP-like directives, a front-end compiler, and a runtime system. This directive-based programming interface helps developers accelerate their application’s execution, e.g. in a cluster equipped with graphics processing units (GPUs), with a low programming effort. On the other hand, the virtualization package rCUDA provides seamless and transparent remote access to any CUDA GPU in a cluster, via the CUDA Driver and Runtime programming interfaces. In this paper we investigate the hurdles and practical advantages of combining these two technologies. Our experimental study targets two cluster configurations: a system where all the GPUs are located into a single cluster node; and a cluster with the GPUs distributed among the nodes. Two applications, the Nbody particle simulation and the Cholesky factorization of a dense matrix, are employed to expose the bottlenecks and performance of a remote virtualization solution applied to these two OmpSs task-parallel codes.

Tags: Computer science, CUDA, GPU cluster, nVidia, OmpSs, Tesla C2050, Tesla M2050, Virtualization

October 8, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)