high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing

An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing

Philip Salzmann, Fabian Knorr, Peter Thoman, Philipp Gschwandtner, Biagio Cosenza, Thomas Fahringer

Distributed and Parallel Systems Group, University of Innsbruck, Austria

23rd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2023

@article{salzmann2023asynchronous,

title={An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing},

author={Salzmann, Philip and Knorr, Fabian and Thoman, Peter and Gschwandtner, Philipp and Cosenza, Biagio and Fahringer, Thomas},

year={2023}

}

Download (PDF)

View

Source

Source codes

Package:

Test benchmarks for CELERITY

1331

views

While domain-specific HPC software packages continue to thrive and are vital to many scientific communities, a general purpose high-productivity GPU cluster programming model that facilitates experimentation for non-experts remains elusive. We demonstrate how Celerity, a high-level C++ programming model for distributed accelerator computing based on the open SYCL standard, allows for the quick development of – and experimentation with – distributed applications. To achieve scalability on large machines, we replace Celerity’s existing master/worker scheduling model with a fully distributed scheme that reduces the worst-case scheduling complexity from quadratic to linear while maintaining the existing programming interface. We then show how this declarative, data-flow based API paired with a point-to-point communication model with eager data pushing can effectively expose and leverage opportunities for latency hiding and computation/communication overlapping with minimal or no manual guidance. We demonstrate how Celerity exhibits very good scalability on multiple benchmarks from several scientific domains and up to 128 GPUs.

Tags: Benchmarking, Computer science, CUDA, GPU cluster, MPI, nVidia, nVidia V100, Package, SYCL

May 21, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)