Study of Bandwidth Partitioning for Co-executing GPU Kernels
Department of Information Technology, Upsala University
Upsala University, 2017
@misc{melander2017study,
title={Study of Bandwidth Partitioning for Co-executing GPU Kernels},
author={Melander, Erik},
year={2017}
}
Co-executing GPU kernels on a partitioned GPU has been shown to improve utilization efficiency of poorly scaling tasks. While kernels can be executed in parallel, data transfers to the GPU are serial which can negatively impact parallelism and predictability of the kernels.In this work we implement a fairness-based approach to memory transfers by chunking data sets and transferring them interleaved and evaluate the overhead of this approach. Then we develop a model to predict when kernels will start using this implementation. We found that chunked transfers in a single CUDA stream have only a small overhead compared to serial transfers, while event synchronized transfers in several streams have larger overhead particularly for chunk sizes less than 500 KB.The prediction models accurately estimate kernel starting times and return transfer times with less than 2.7% relative error.
December 7, 2017 by hgpu