28033

ARK: GPU-driven Code Execution for Distributed Deep Learning

Changho Hwang, KyoungSoo Park, Ran Shu, Xinyuan Qu, Peng Cheng, Yongqiang Xiong
KAIST
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2023), 2023

@article{hwang2023ark,

   title={ARK: GPU-driven Code Execution for Distributed Deep Learning},

   author={Hwang, Changho and Park, KyoungSoo and Shu, Ran and Qu, Xinyuan and Cheng, Peng and Xiong, Yongqiang},

   year={2023}

}

Download Download (PDF)   View View   Source Source   

542

views

Modern state-of-the-art deep learning (DL) applications tend to scale out to a large number of parallel GPUs. Unfortunately, we observe that the collective communication overhead across GPUs is often the key limiting factor of performance for distributed DL. It under-utilizes the networking bandwidth by frequent transfers of small data chunks, which also incurs a substantial I/O overhead on GPU that interferes with computation on GPU. The root cause lies in the inefficiency of CPU-based communication event handling as well as the inability to control the GPU’s internal DMA engine with GPU threads. To address the problem, we propose a GPU-driven code execution system that leverages a GPU-controlled hardware DMA engine for I/O offloading. Our custom DMA engine pipelines multiple DMA requests to support efficient small data transfer while it eliminates the I/O overhead on GPU cores. Unlike existing GPU DMA engines initiated only by CPU, we let GPU threads directly control DMA operations, which leads to a highly efficient system where GPUs drive their own execution flow and handle communication events autonomously without CPU intervention. Our prototype DMA engine achieves a line-rate from a message size as small as 8KB (3.9x better throughput) with only 4.3µs of communication latency (9.1x faster) while it incurs little interference with computation on GPU, achieving 1.8x higher all-reduce throughput in a real training workload.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: