Study of Bandwidth Partitioning for Co-executing GPU Kernels

Erik Melander
Department of Information Technology, Upsala University
Upsala University, 2017


   title={Study of Bandwidth Partitioning for Co-executing GPU Kernels},

   author={Melander, Erik},



Download Download (PDF)   View View   Source Source   



Co-executing GPU kernels on a partitioned GPU has been shown to improve utilization efficiency of poorly scaling tasks. While kernels can be executed in parallel, data transfers to the GPU are serial which can negatively impact parallelism and predictability of the kernels.In this work we implement a fairness-based approach to memory transfers by chunking data sets and transferring them interleaved and evaluate the overhead of this approach. Then we develop a model to predict when kernels will start using this implementation. We found that chunked transfers in a single CUDA stream have only a small overhead compared to serial transfers, while event synchronized transfers in several streams have larger overhead particularly for chunk sizes less than 500 KB.The prediction models accurately estimate kernel starting times and return transfer times with less than 2.7% relative error.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: