The Distribution of OpenCL Kernel Execution Across Multiple Devices

Steven Gurfinkel
Department of Electrical and Computer Engineering, University of Toronto
University of Toronto, 2014


   title={The Distribution of OpenCL Kernel Execution Across Multiple Devices},

   author={Gurfinkel, Steven},


   school={University of Toronto}


Download Download (PDF)   View View   Source Source   



Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framework, can program individual CPUs or GPUs; however, distributing a problem across multiple devices is more difficult. This thesis contributes three OpenCL runtimes that automatically distribute a problem across multiple devices: DualCL and m2sOpenCL, which distribute tasks across a single system’s CPU and GPU, and DistCL, which distributes tasks across a cluster’s GPUs. DualCL and DistCL run on existing hardware, m2sOpenCL runs in simulation. On a system with a discrete GPU and a system with integrated CPU and GPU devices, running programs from the Rodinia benchmark suite, DualCL improves performance over a single device, when host memory is used. Running similar benchmarks, m2sOpenCL shows that reducing the overheads present in current systems improves performance. DistCL accelerates unmodified compute intense OpenCL kernels, obtained from Rodinia, AMD samples and elsewhere, when distributing them across a cluster.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: