GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python

hgpu.org » Applications » Computer science » GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python

GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python

Jaemin Choi, Zane Fink, Sam White, Nitin Bhat, David F. Richards, Laxmikant V. Kale

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA

arXiv:2102.12416 [cs.DC], (24 Feb 2021)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Charm++: a message-passing parallel language and runtime system

2168

views

As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication of GPU data is becoming one of the most critical components of high-performance computing. For developers of parallel programming models, implementing support for GPU-aware communication using native APIs for GPUs such as CUDA can be a daunting task as it requires considerable effort with little guarantee of performance. In this work, we demonstrate the capability of the Unified Communication X (UCX) framework to compose a GPU-aware communication layer that serves multiple parallel programming models developed out of the Charm++ ecosystem, including MPI and Python: Charm++, Adaptive MPI (AMPI), and Charm4py. We demonstrate the performance impact of our designs with microbenchmarks adapted from the OSU benchmark suite, obtaining improvements in latency of up to 10.2x, 11.7x, and 17.4x in Charm++, AMPI, and Charm4py, respectively. We also observe increases in bandwidth of up to 9.6x in Charm++, 10x in AMPI, and 10.5x in Charm4py. We show the potential impact of our designs on real-world applications by evaluating weak and strong scaling performance of a proxy application that performs the Jacobi iterative method, improving the communication performance by up to 12.4x in Charm++, 12.8x in AMPI, and 19.7x in Charm4py.

Tags: Benchmarking, Computer science, CUDA, Distributed computing, MPI, nVidia, Package, Python, Tesla V100

February 28, 2021 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org