25626

An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads

Cheng-Hsiang Chiu, Dian-Lun Lin, Tsung-Wei Huang
University of Utah, Salt Lake City, UT, USA
EasyChair Preprint no. 6531, 2021

@techreport{chiu2021experimental,

   title={An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads},

   author={Chiu, Cheng-Hsiang and Lin, Dian-Lun and Huang, Tsung-Wei},

   year={2021},

   institution={EasyChair}

}

Download Download (PDF)   View View   Source Source   

1370

views

Task graph parallelism has emerged as an important tool to efficiently execute large machine learning workloads on GPUs. Users describe a GPU workload in a task dependency graph rather than aggregated GPU operations and dependencies, allowing the runtime to run whole-graph scheduling optimization to significantly improve the performance. While the new CUDA graph execution model has demonstrated significant success on this front, the counterpart for SYCL, a general-purpose heterogeneous programming model using standard C++, remains nascent. Unlike CUDA graph, SYCL runtime leverages out-of-order queues to implicitly create a task execution graph induced by data dependencies. For explicit task dependencies, users are responsible for creating SYCL events and synchronizing them at a non-negligible cost. Furthermore, there is no specialized graph execution model that allows users to offload a task graph directly onto a SYCL device in a similar way to CUDA graph. This paper conducts an experimental study of SYCL’s default task graph parallelism by comparing it with CUDA graph on large-scale machine learning workloads in the recent HPEC Graph Challenge. Our result highlights the need for a new SYCL graph execution model in the standard.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: