{"id":26153,"date":"2022-01-16T13:53:34","date_gmt":"2022-01-16T11:53:34","guid":{"rendered":"https:\/\/hgpu.org\/?p=26153"},"modified":"2022-01-16T13:53:34","modified_gmt":"2022-01-16T11:53:34","slug":"dopia-online-parallelism-management-for-integrated-cpu-gpu-architectures","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=26153","title":{"rendered":"Dopia: Online Parallelism Management for Integrated CPU\/GPU Architectures"},"content":{"rendered":"<p>Recent desktop and mobile processors often integrate CPU and GPU onto the same die. The limited memory bandwidth of these integrated architectures can negatively affect the performance of data-parallel workloads when all computational resources are active. The combination of active CPU and GPU cores achieving the maximum performance depends on a workload\u2019s characteristics, making manual tuning a time-consuming task. Dopia is a fully automated framework that improves the performance of data-parallel workloads by adjusting the Degree Of Parallelism on Integrated Architectures. Dopia transparently analyzes and rewrites OpenCL kernels before executing them with the number of CPU and GPU cores expected to yield the best performance. Evaluated on AMD and Intel integrated processors, Dopia achieves 84% of the maximum performance attainable by an oracle.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recent desktop and mobile processors often integrate CPU and GPU onto the same die. The limited memory bandwidth of these integrated architectures can negatively affect the performance of data-parallel workloads when all computational resources are active. The combination of active CPU and GPU cores achieving the maximum performance depends on a workload\u2019s characteristics, making manual [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,90,3],"tags":[215,1782,1025,1793],"class_list":["post-26153","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-opencl","category-paper","tag-code-generation","tag-computer-science","tag-machine-learning","tag-opencl"],"views":1475,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/26153","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=26153"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/26153\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=26153"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=26153"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=26153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}