Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform
Software Institute, School of Computer, National University of Defense Technology, Changsha, China
arXiv:1603.08619 [cs.DC], (29 Mar 2016)
@article{li2016evaluating,
title={Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform},
author={Li, Zhaokui and Fang, Jianbin and Tang, Tao and Chen, Xuhao and Chen, Cheng and Yang, Canqun},
year={2016},
month={mar},
archivePrefix={"arXiv"},
primaryClass={cs.DC}
}
Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the performance benefits of using multiple streams. The evaluation work is performed at two levels: the microbenchmarking level and the real-world application level. Our experimental results at the microbenchmark level show that data transfers and kernel execution can be overlapped on Phi, while data transfers in both directions are performed in a serial manner. At the real-world application level, we show that both overlappable and non-overlappable applications can benefit from using multiple streams (with an performance improvement of up to 24%). We also quantify how task granularity and resource granularity impact the overall performance. Finally, we present a set of heuristics to reduce the search space when determining a proper task granularity and resource granularity. To conclude, our evaluation work provides lots of insights for runtime and architecture designers when using multiple streams on Phi.
April 3, 2016 by hgpu