high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » GPU-Aware Non-contiguous Data Movement In Open MPI

GPU-Aware Non-contiguous Data Movement In Open MPI

Wei Wu, George Bosilca, Rolf vandeVaart, Sylvain Jeaugey, Jack Dongarra

University of Tennessee, Knoxville, USA

25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC’16), 2016

@article{wu2016gpu,

title={GPU-Aware Non-contiguous Data Movement In Open MPI},

author={Wu, Wei and Bosilca, George and Jeaugey, Sylvain and Dongarra, Jack},

year={2016}

}

Download (PDF)

View

Source

1652

views

Due to better parallel density and power efficiency, GPUs have become more popular for use in scientific applications. Many of these applications are based on the ubiquitous Message Passing Interface (MPI) programming paradigm, and take advantage of non-contiguous memory layouts to exchange data between processes. However, support for efficient non-contiguous data movements for GPU-resident data is still in its infancy, imposing a negative impact on the overall application performance. To address this shortcoming, we present a solution where we take advantage of the inherent parallelism in the datatype packing and unpacking operations. We developed a close integration between Open MPI’s stack-based datatype engine, NVIDIA’s Unified Memory Architecture and GPUDirect capabilities. In this design the datatype packing and unpacking operations are offloaded onto the GPU and handled by specialized GPU kernels, while the CPU remains the driver for data movements between nodes. By incorporating our design into the Open MPI library we have shown significantly better performance for non-contiguous GPU-resident data transfers on both shared and distributed memory machines.

Tags: Computer science, CUDA, MPI, nVidia, Tesla K40

April 26, 2016 by hgpu

Rating: 2.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

GPU-Aware Non-contiguous Data Movement In Open MPI

Your response

Recent source codes

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

HPC Benchmark Survey

HDM: Home made Diffusion Models

General Matrix Multiplication (GEMM)

CrossTL: Universal Programming Language & Translator

TBD-GPU

DG-SWEM - The Discontinuous Galerkin Shallow Water Equation Model

torchPDLP: Primal-Dual Linear Programming in PyTorch. In collaboration with AMD and IPAM

Benchmarks for Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs

Most viewed papers (last 30 days)

GPU-Aware Non-contiguous Data Movement In Open MPI

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)