A Case for Work-stealing on FPGAs with OpenCL Atomics
Imperial College London, UK
FPGA, 2016
@article{ramanathan2016case,
title={A Case for Work-stealing on FPGAs with OpenCL Atomics},
author={Ramanathan, Nadesh and Wickerson, John and Winterstein, Felix and Constantinides, George A},
year={2016}
}
We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman-Tsigas implementation for GPUs, we synchronize workitems not with locks, mutexes or critical sections, but instead with the atomic operations provided by Altera’s OpenCL SDK. We evaluate work-stealing for FPGAs by synthesizing a K-means clustering algorithm on an Altera P385 D5 board, both with work-stealing and with a statically-partitioned load. When block RAM utilization is maximized in both cases, we find that work-stealing leads to a 1.5x speedup. This demonstrates that the ability to do load balancing at run-time can outweigh the drawback of using "expensive" atomics on FPGAs. We hope that our case study will stimulate further research into the high-level synthesis of fine-grained, lock-free, concurrent programs.
January 14, 2016 by hgpu