high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

Kazuaki Matsumura, Mitsuhisa Sato, Taisuke Boku, Artur Podobas, Satoshi Matsuoka

Tokyo Institute of Technology, Tokyo, Japan

In: Yokota R., Wu W. (eds) Supercomputing Frontiers. SCFA 2018. Lecture Notes in Computer Science, vol 10776. Springer, 2018

DOI:10.1007/978-3-319-69953-0_7

BibTeX

Download (PDF)

View

Source

1890

views

Graphics Processing Units (GPUs) perform the majority of computations in state-of-the-art supercomputers. Programming these GPUs is often assisted using a programming model such as (amongst others) the directive-driven OpenACC. Unfortunately, OpenACC (and other similar models) are incapable of automatically targeting and distributing work across several GPUs, which decreases productivity and forces needless manual labor upon programmers. We propose a method that enables OpenACC applications to target multi-GPU. Workload distribution, data transfer and inter-GPU communication (including modern GPU-to-GPU links) are automatically and transparently handled by our compiler with no user intervention and no changes to the program code. Our method leverages existing OpenMP and OpenACC backends, ensuring easy integration into existing HPC infrastructure. Empirically we quantify performance gains and losses in our data coherence method compared to similar approaches, and also show that our approach can compete with the performance of hand-written MPI code.

Tags: Compilers, Computer science, nVidia, OpenACC, OpenMP, Tesla P100

March 22, 2018 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)