{"id":28279,"date":"2023-05-21T22:03:56","date_gmt":"2023-05-21T19:03:56","guid":{"rendered":"https:\/\/hgpu.org\/?p=28279"},"modified":"2023-05-21T22:03:56","modified_gmt":"2023-05-21T19:03:56","slug":"an-asynchronous-dataflow-driven-execution-model-for-distributed-accelerator-computing","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=28279","title":{"rendered":"An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing"},"content":{"rendered":"<p>While domain-specific HPC software packages continue to thrive and are vital to many scientific communities, a general purpose high-productivity GPU cluster programming model that facilitates experimentation for non-experts remains elusive. We demonstrate how Celerity, a high-level C++ programming model for distributed accelerator computing based on the open SYCL standard, allows for the quick development of \u2013 and experimentation with \u2013 distributed applications. To achieve scalability on large machines, we replace Celerity\u2019s existing master\/worker scheduling model with a fully distributed scheme that reduces the worst-case scheduling complexity from quadratic to linear while maintaining the existing programming interface. We then show how this declarative, data-flow based API paired with a point-to-point communication model with eager data pushing can effectively expose and leverage opportunities for latency hiding and computation\/communication overlapping with minimal or no manual guidance. We demonstrate how Celerity exhibits very good scalability on multiple benchmarks from several scientific domains and up to 128 GPUs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>While domain-specific HPC software packages continue to thrive and are vital to many scientific communities, a general purpose high-productivity GPU cluster programming model that facilitates experimentation for non-experts remains elusive. We demonstrate how Celerity, a high-level C++ programming model for distributed accelerator computing based on the open SYCL standard, allows for the quick development of [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[451,1782,14,106,242,20,2115,176,1845],"class_list":["post-28279","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-benchmarking","tag-computer-science","tag-cuda","tag-gpu-cluster","tag-mpi","tag-nvidia","tag-nvidia-v100","tag-package","tag-sycl"],"views":1217,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/28279","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=28279"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/28279\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=28279"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=28279"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=28279"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}