Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Dept. of Computer Science, University of Illinois, Urbana-Champaign, United States
Proceedings of (PLC’12) Multicore and GPU Programming Models, Languages and Compilers Workshop at IPDPS 2012, 2012
@article{lifflander2012dynamic,
title={Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters},
author={Lifflander, J. and Evans, G.C. and Arya, A. and Kale, L.V.},
year={2012}
}
Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism for accelerators, and decompositions targeted at the GPU may decrease performance on the CPU. This problem is typically ameliorated by statically scheduling a fixed amount of work for agglomeration. However, determining the ideal amount of work to compose requires experimentation because it varies between architectures and problem configurations. This paper describes a novel methodology for dynamically agglomerating work units at runtime and scheduling them on accelerators. This approach is demonstrated in the context of two applications: an n-body particle simulation, which offloads particle interaction work; and a parallel dense LU solver, which relocates DGEMM kernels to the GPU. In both cases dynamic agglomeration yields comparable or better results over statically scheduling the work across a variety of system configurations.
March 9, 2012 by hgpu