Compiler-based Data Prefetching and Streaming Non-temporal Store Generation for the Intel Xeon Phi Coprocessor
Intel Corporation
Workshop on Multithreaded Architectures and Applications (MTAAP 2013), 2013
@article{krishnaiyer2013compiler,
title={Compiler-based Data Prefetching and Streaming Non-temporal Store Generation for the IntelR Xeon Phi TM Coprocessor},
author={Krishnaiyer, Rakesh and K{"u}lt{"u}rsay, Emre and Chawla, Pankaj and Preis, Serguei and Zvezdin, Anatoly and Saito, Hideki},
year={2013}
}
The Intel Xeon Phi coprocessor has software prefetching instructions to hide memory latencies and special store instructions to save bandwidth on streaming nontemporal store operations. In this work, we provide details on compiler-based generation of these instructions and evaluate their impact on the performance of the Intel Xeon Phi coprocessor using a wide range of parallel applications with different characteristics. Our results show that the Intel Composer XE 2013 compiler can make effective use of these mechanisms to achieve significant performance improvements.
August 8, 2013 by hgpu