{"id":10253,"date":"2013-08-08T23:53:27","date_gmt":"2013-08-08T20:53:27","guid":{"rendered":"http:\/\/hgpu.org\/?p=10253"},"modified":"2013-08-08T23:53:27","modified_gmt":"2013-08-08T20:53:27","slug":"compiler-based-data-prefetching-and-streaming-non-temporal-store-generation-for-the-intel-xeon-phi-coprocessor","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=10253","title":{"rendered":"Compiler-based Data Prefetching and Streaming Non-temporal Store Generation for the Intel Xeon Phi Coprocessor"},"content":{"rendered":"<p>The Intel Xeon Phi coprocessor has software prefetching instructions to hide memory latencies and special store instructions to save bandwidth on streaming nontemporal store operations. In this work, we provide details on compiler-based generation of these instructions and evaluate their impact on the performance of the Intel Xeon Phi coprocessor using a wide range of parallel applications with different characteristics. Our results show that the Intel Composer XE 2013 compiler can make effective use of these mechanisms to achieve significant performance improvements.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Intel Xeon Phi coprocessor has software prefetching instructions to hide memory latencies and special store instructions to save bandwidth on streaming nontemporal store operations. In this work, we provide details on compiler-based generation of these instructions and evaluate their impact on the performance of the Intel Xeon Phi coprocessor using a wide range of [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,3],"tags":[955,1782,905,1388,546],"class_list":["post-10253","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-paper","tag-compilers","tag-computer-science","tag-intel","tag-intel-phi","tag-prefetch"],"views":3295,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/10253","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10253"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/10253\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10253"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10253"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}