The Distribution of OpenCL Kernel Execution Across Multiple Devices
Department of Electrical and Computer Engineering, University of Toronto
University of Toronto, 2014
@phdthesis{gurfinkel2014distribution,
title={The Distribution of OpenCL Kernel Execution Across Multiple Devices},
author={Gurfinkel, Steven},
year={2014},
school={University of Toronto}
}
Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framework, can program individual CPUs or GPUs; however, distributing a problem across multiple devices is more difficult. This thesis contributes three OpenCL runtimes that automatically distribute a problem across multiple devices: DualCL and m2sOpenCL, which distribute tasks across a single system’s CPU and GPU, and DistCL, which distributes tasks across a cluster’s GPUs. DualCL and DistCL run on existing hardware, m2sOpenCL runs in simulation. On a system with a discrete GPU and a system with integrated CPU and GPU devices, running programs from the Rodinia benchmark suite, DualCL improves performance over a single device, when host memory is used. Running similar benchmarks, m2sOpenCL shows that reducing the overheads present in current systems improves performance. DistCL accelerates unmodified compute intense OpenCL kernels, obtained from Rodinia, AMD samples and elsewhere, when distributing them across a cluster.
October 16, 2014 by hgpu