high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Blink: Fast and Generic Collectives for Distributed ML

Blink: Fast and Generic Collectives for Distributed ML

Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, Ion Stoica

Microsoft Research

arXiv:1910.04940 [cs.DC], (11 Oct 2019)

@misc{wang2019blink,

title={Blink: Fast and Generic Collectives for Distributed ML},

author={Guanhua Wang and Shivaram Venkataraman and Amar Phanishayee and Jorgen Thelin and Nikhil Devanur and Ion Stoica},

year={2019},

eprint={1910.04940},

archivePrefix={arXiv},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

2928

views

Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale. Existing parameter synchronization protocols cannot effectively leverage available network resources in the face of ever increasing hardware heterogeneity. To address this, we propose Blink, a collective communication library that dynamically generates optimal communication primitives by packing spanning trees. We propose techniques to minimize the number of trees generated and extend Blink to leverage heterogeneous communication channels for faster data transfers. Evaluations show that compared to the state-of-the-art (NCCL), Blink can achieve up to 8x faster model synchronization, and reduce end-to-end training time for image classification tasks by up to 40%.

Tags: Computer science, CUDA, Distributed computing, Heterogeneous systems, Machine learning, nVidia, nVidia DGX-1, Tesla V100

October 20, 2019 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Blink: Fast and Generic Collectives for Distributed ML

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Blink: Fast and Generic Collectives for Distributed ML

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)