high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Efficient Intranode Communication in GPU-Accelerated Systems

Efficient Intranode Communication in GPU-Accelerated Systems

Feng Ji, Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, Xiaosong Ma

Department of Computer Science, North Carolina State University

2nd IEEE International Workshop on Accelerators and Hybrid Exascale Systems (in conjunction with the 26th IEEE International Parallel and Distributed Processing Symposium), 2012

@InProceedings{aji-intranode-comm-ashes12,

author={Ji, Feng and Aji, Ashwin and Dinan, James and Buntinas, Darius and Balaji, Pavan and Feng, Wu-chun and Ma, Xiaosong},

title={"{Efficient Intranode Communication in GPU-Accelerated Systems}"},

booktitle={2nd IEEE International Workshop on Accelerators and Hybrid Exascale Systems (in conjunction with the 26th IEEE International Parallel and Distributed Processing Symposium)},

address={Shanghai, China},

month={May},

year={2012}

}

Download (PDF)

View

Source

1327

views

Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) and require programmers to explicitly move data between memory spaces. This approach is inefficient, especially for intranode communication where it can result in several extra copy operations. In this work, we integrate GPU-awareness into a popular MPI runtime system and develop techniques to significantly reduce the cost of intranode communication involving one or more GPUs. Experiment results show an up to 2x increase in bandwidth, resulting in an average of 4.3% improvement to the total execution time of a halo exchange benchmark.

Tags: Benchmarking, Computer science, CUDA, MPI, nVidia, Performance, Tesla M2070

May 4, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Efficient Intranode Communication in GPU-Accelerated Systems

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Efficient Intranode Communication in GPU-Accelerated Systems

Share this:

Recent source codes

Most viewed papers (last 30 days)