high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

Xin Chen, Hua Zhou, Yuxiang Gao, Yu Zhu, Dongyan Wang

Emerging Technology Center, Midea Corporate Research Center, San Jose, CA, USA

arXiv:1802.02326 [cs.CV], (7 Feb 2018)

@article{chen2018mimatrix,

title={MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster},

author={Chen, Xin and Zhou, Hua and Gao, Yuxiang and Zhu, Yu and Wang, Dongyan},

year={2018},

month={feb},

archivePrefix={"arXiv"},

primaryClass={cs.CV}

}

Download (PDF)

View

Source

1863

views

In this paper, we present a co-designed petascale high-density GPU cluster to expedite distributed deep learning training with synchronous Stochastic Gradient Descent (SSGD). This architecture of our heterogeneous cluster is inspired by Harvard architecture. Regarding to different roles in the system, nodes are configured as different specifications. Based on the topology of the whole system’s network and properties of different types of nodes, we develop and implement a novel job server parallel software framework, named by MiMatrix, for distributed deep learning training. Compared to the parameter server framework, in which parameter server is a bottleneck of data transfer in AllReduce algorithm of SSGD, the job server undertakes all of controlling, scheduling and monitoring tasks without model data transfer. In MiMatrix, we propose a novel GPUDirect Remote direct memory access (RDMA)-aware parallel algorithm of AllReucde executed by computing servers, which both computation and handshake message are O(1) at each epoch

Tags: Algorithms, Computer science, CUDA, Deep learning, GPU cluster, Heterogeneous systems, nVidia, RDMA, Tesla P100

February 9, 2018 by hgpu

Rating: 3.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)