high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform

Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform

Zhaokui Li, Jianbin Fang, Tao Tang, Xuhao Chen, Cheng Chen, Canqun Yang

Software Institute, School of Computer, National University of Defense Technology, Changsha, China

arXiv:1603.08619 [cs.DC], (29 Mar 2016)

@article{li2016evaluating,

title={Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform},

author={Li, Zhaokui and Fang, Jianbin and Tang, Tao and Chen, Xuhao and Chen, Cheng and Yang, Canqun},

year={2016},

month={mar},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

1624

views

Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the performance benefits of using multiple streams. The evaluation work is performed at two levels: the microbenchmarking level and the real-world application level. Our experimental results at the microbenchmark level show that data transfers and kernel execution can be overlapped on Phi, while data transfers in both directions are performed in a serial manner. At the real-world application level, we show that both overlappable and non-overlappable applications can benefit from using multiple streams (with an performance improvement of up to 24%). We also quantify how task granularity and resource granularity impact the overall performance. Finally, we present a set of heuristics to reduce the search space when determining a proper task granularity and resource granularity. To conclude, our evaluation work provides lots of insights for runtime and architecture designers when using multiple streams on Phi.

Tags: Benchmarking, Computer science, Heterogeneous systems, Intel Xeon Phi, Performance

April 3, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform

Your response

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)

Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)