high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Message passing on data-parallel architectures

Message passing on data-parallel architectures

Jeff A. Stuart, John D. Owens

Department of Computer Science, University of California, Davis

Parallel and Distributed Processing Symposium, International In IEEE International Symposium on Parallel & Distributed Processing (IPDPS’09), Vol. 0 (2009), pp. 1-12

DOI:10.1109/IPDPS.2009.5161065

BibTeX

Download (PDF)

View

Source

1622

views

This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors. As a case study, we design and implement the “DCGN” API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU-based MPI implementations while providing fully-dynamic communication.

Tags: Computer science, GPU cluster, MPI, Networks, nVidia, nVidia GeForce GTX 280

January 6, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Message passing on data-parallel architectures

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Message passing on data-parallel architectures

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)