29266

Composing Distributed Computations Through Task and Kernel Fusion

Rohan Yadav, Shiv Sundram, Wonchan Lee, Michael Garland, Michael Bauer, Alex Aiken, Fredrik Kjolstad
Stanford University, USA
arXiv:2406.18109 [cs.DC], (26 Jun 2024)

@misc{yadav2024composingdistributedcomputationstask,

   title={Composing Distributed Computations Through Task and Kernel Fusion},

   author={Rohan Yadav and Shiv Sundram and Wonchan Lee and Michael Garland and Michael Bauer and Alex Aiken and Fredrik Kjolstad},

   year={2024},

   eprint={2406.18109},

   archivePrefix={arXiv},

   primaryClass={cs.DC},

   url={https://arxiv.org/abs/2406.18109}

}

Download Download (PDF)   View View   Source Source   

674

views

We introduce Diffuse, a system that dynamically performs task and kernel fusion in distributed, task-based runtime systems. The key component of Diffuse is an intermediate representation of distributed computation that enables the necessary analyses for the fusion of distributed tasks to be performed in a scalable manner. We pair task fusion with a JIT compiler to fuse together the kernels within fused tasks. We show empirically that Diffuse’s intermediate representation is general enough to be a target for two real-world, task-based libraries (cuNumeric and Legate Sparse), letting Diffuse find optimization opportunities across function and library boundaries. Diffuse accelerates unmodified applications developed by composing task-based libraries by 1.86x on average (geo-mean), and by between 0.93x–10.7x on up to 128 GPUs. Diffuse also finds optimization opportunities missed by the original application developers, enabling high-level Python programs to match or exceed the performance of an explicitly parallel MPI library.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: