{"id":11361,"date":"2014-02-08T02:45:09","date_gmt":"2014-02-08T00:45:09","guid":{"rendered":"http:\/\/hgpu.org\/?p=11361"},"modified":"2015-08-27T01:38:29","modified_gmt":"2015-08-26T22:38:29","slug":"developmental-directions-in-parallel-accelerators","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=11361","title":{"rendered":"Developmental Directions in Parallel Accelerators"},"content":{"rendered":"<p>Parallel accelerators such as massively-cored graphical processing units or many-cored co-processors such as the Xeon Phi are becoming widespread and affordable on many systems including blade servers and even desktops. The use of a single such accelerator is now quite common for many applications, but the use of multiple devices and hybrid combinations is still very unusual. The main barrier to greater uptake of multiple accelerators in applications is still the software ecosystem and in particular the interoperability limitations of setting up appropriate software stacks for novel accelerator combinations. We present some benchmark results for various multiple and hybrid accelerator combinations using some up to date modern devices and discuss feasible developmental directions for high computational performance scientific applications software to use them. We compare results with equivalent benchmarks on conventional multi-cored CPUs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Parallel accelerators such as massively-cored graphical processing units or many-cored co-processors such as the Xeon Phi are becoming widespread and affordable on many systems including blade servers and even desktops. The use of a single such accelerator is now quite common for many applications, but the use of multiple devices and hybrid combinations is still [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[451,1782,14,1483,20,1470],"class_list":["post-11361","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-benchmarking","tag-computer-science","tag-cuda","tag-intel-xeon-phi","tag-nvidia","tag-nvidia-geforce-gtx-titan"],"views":1821,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/11361","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=11361"}],"version-history":[{"count":1,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/11361\/revisions"}],"predecessor-version":[{"id":14490,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/11361\/revisions\/14490"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=11361"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=11361"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=11361"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}