high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » DMA-Assisted, Intranode Communication in GPU Accelerated Systems

DMA-Assisted, Intranode Communication in GPU Accelerated Systems

Feng Ji, Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Rajeev Thakur, Wu-Chun Feng, Xiaosong Ma

Department of Computer Science, North Carolina State University

14th IEEE International Conference on High Performance Computing and Communications (HPCC), 2012

@article{ji2012dma,

title={DMA-Assisted, Intranode Communication in GPU Accelerated Systems},

author={Ji, F. and Aji, A.M. and Dinan, J. and Buntinas, D. and Balaji, P. and Thakur, R. and Feng, W. and Ma, X.},

year={2012}

}

Download (PDF)

View

Source

2160

views

Accelerator awareness has become a pressing issue in data movement models, such as MPI, because of the rapid deployment of systems that utilize accelerators. In our previous work, we developed techniques to enhance MPI with accelerator awareness, thus allowing applications to easily and efficiently communicate data between accelerator memories. In this paper, we extend this work with techniques to perform efficient data movement between accelerators within the same node using a DMA-assisted, peer-to-peer intranode communication technique that was recently introduced for NVIDIA GPUs. We present a detailed design of our new approach to intranode communication and evaluate its improvement to communication and application performance using micro-kernel benchmarks and a 2D stencil application kernel.

Tags: Benchmarking, Computer science, CUDA, MPI, nVidia, Tesla M2070

June 6, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

DMA-Assisted, Intranode Communication in GPU Accelerated Systems

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

DMA-Assisted, Intranode Communication in GPU Accelerated Systems

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)