Supporting Iteration in a Heterogeneous Data Flow Engine
Microsoft Research
The 3rd Workshop on Systems for Future Multicore Architectures, 2013
@article{currey2013supporting,
title={Supporting Iteration in a Heterogeneous Data Flow Engine},
author={Currey, Jon and Baker, Simon and Rossbach, Christopher J.},
year={2013}
}
Dataflow execution engines such as MapReduce, DryadLINQ, and PTask have enjoyed success because they simplify development for a class of important parallel applications. These systems sacrifice generality for simplicity: while many workloads are easily expressed, important idioms like iteration and recursion are difficult to express and support efficiently. We consider the problem of extending a dataflow engine to support data-dependent iteration in a heterogeneous environment, where architectural diversity introduces data migration and scheduling challenges that complicate the problem. We propose constructs that enable a dataflow engine to efficiently support data-dependent control flow in a heterogeneous environment, implement them in a prototype system called IDEA, and use them to implement a variant of optical flow, a well-studied computer vision algorithm. Optical flow relies heavily on nested loops, making it difficult to express without explicit support for iteration. We demonstrate that IDEA enables up to 18x speedup over sequential and 32% speedup over a GPU implementation using synchronous host-based control.
April 18, 2013 by hgpu