high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Dynamic Fine-Grain Scheduling of Pipeline Parallelism

Dynamic Fine-Grain Scheduling of Pipeline Parallelism

Daniel Sanchez, David Lo, Richard M. Yoo, Jeremy Sugerman, Christos Kozyrakis

Pervasive Parallelism Laboratory, Stanford University

20th International Conference on Parallel Architectures and Compilation Techniques (PACT ’11), 2011

@article{sanchez2011dynamic,

title={Dynamic Fine-Grain Scheduling of Pipeline Parallelism},

author={Sanchez, D. and Lo, D. and Yoo, R.M. and Sugerman, J. and Kozyrakis, C.},

year={2011}

}

Download (PDF)

View

Source

1767

views

Scheduling pipeline-parallel programs, defined as a graph of stages that communicate explicitly through queues, is challenging. When the application is regular and the underlying architecture can guarantee predictable execution times, several techniques exist to compute highly optimized static schedules. However, these schedules do not admit run-time load balancing, so variability introduced by the application or the underlying hardware causes load imbalance, hindering performance. On the other hand, existing schemes for dynamic fine-grain load balancing (such as task-stealing) do not work well on pipeline-parallel programs: they cannot guarantee memory footprint bounds, and do not adequately schedule complex graphs or graphs with ordered queues. We present a scheduler implementation for pipeline-parallel programs that performs fine-grain dynamic load balancing efficiently. Specifically, we implement the first real runtime for GRAMPS, a recently proposed programming model that focuses on supporting irregular pipeline and data-parallel applications (in contrast to classical stream programming models and schedulers, which require programs to be regular). Task-stealing with perstage queues and queuing policies, coupled with a backpressure mechanism, allow us to maintain strict footprint bounds, and a buffer management scheme based on packet-stealing allows lowoverhead and locality-aware dynamic allocation of queue data. We evaluate our runtime on a multi-core SMP and find that it provides low-overhead scheduling of irregular workloads while maintaining locality. We also show that the GRAMPS scheduler outperforms several other commonly used scheduling approaches. Specifically, while a typical task-stealing scheduler performs on par with GRAMPS on simple graphs, it does significantly worse on complex ones; a canonical GPGPU scheduler cannot exploit pipeline parallelism and suffers from large memory footprints; and a typical static, streaming scheduler achieves somewhat better locality, but suffers significant load imbalance on a generalpurpose multi-core due to fine-grain architecture variability (e.g., cache misses and SMT).

Tags: Computer science, CUDA, nVidia, Performance, Programming techniques

October 17, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Dynamic Fine-Grain Scheduling of Pipeline Parallelism

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Dynamic Fine-Grain Scheduling of Pipeline Parallelism

Share this:

Recent source codes

Most viewed papers (last 30 days)