{"id":10643,"date":"2013-10-03T23:59:36","date_gmt":"2013-10-03T20:59:36","guid":{"rendered":"http:\/\/hgpu.org\/?p=10643"},"modified":"2013-10-03T23:59:36","modified_gmt":"2013-10-03T20:59:36","slug":"towards-multi-gpu-support-in-the-marrow-skeleton-framework","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=10643","title":{"rendered":"Towards Multi-GPU Support in the Marrow Skeleton Framework"},"content":{"rendered":"<p>A emerging trend in the field of Graphics Processing Unit (GPU) computing is the harnessing of multiple devices to tackle bigger problems and increase performance. Multi-GPU execution adds new challenges to the already complex world of General Purpose computing of GPUs (GPGPU), such as the efficient GPU-aware problem decomposition, and coping with heterogeneity. To this extent, we propose the use of the Marrow algorithmic framework (ASkF) to abstract most of the details intrinsic to the programming of such platforms. To the best of our knowledge, Marrow is the first ASkF to address the implementation of task-parallel skeletons, such as Pipeline, on single and (now) multiple GPU systems. The framework transparently decomposes the problem&#8217;s domain and schedules the generated tasks among a set of, possibly, heterogeneous devices. To assess the proposal&#8217;s effectiveness, we present some initial experimental results that deliver good scalability results.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A emerging trend in the field of Graphics Processing Unit (GPU) computing is the harnessing of multiple devices to tackle bigger problems and increase performance. Multi-GPU execution adds new challenges to the already complex world of General Purpose computing of GPUs (GPGPU), such as the efficient GPU-aware problem decomposition, and coping with heterogeneity. To this [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[36,11,90,3],"tags":[1787,7,1361,1782,452,20,1009,1793,199],"class_list":["post-10643","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-computer-science","category-opencl","category-paper","tag-algorithms","tag-ati","tag-ati-radeon-hd-7950","tag-computer-science","tag-heterogeneous-systems","tag-nvidia","tag-nvidia-quadro-fx-3800","tag-opencl","tag-tesla-c1060"],"views":2668,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/10643","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10643"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/10643\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10643"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10643"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10643"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}