{"id":6968,"date":"2012-01-19T14:07:13","date_gmt":"2012-01-19T12:07:13","guid":{"rendered":"http:\/\/hgpu.org\/?p=6968"},"modified":"2012-01-19T14:07:13","modified_gmt":"2012-01-19T12:07:13","slug":"power-profiling-and-optimization-for-heterogeneous-multi-core-systems","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=6968","title":{"rendered":"Power Profiling and Optimization for Heterogeneous Multi-Core Systems"},"content":{"rendered":"<p>Processing speed and energy efficiency are two of the most critical issues for computer systems. This paper presents a systematic approach for profiling the power and performance characteristics of application targeting heterogeneous multi-core computing platforms. Our approach enables rapid and automated design space exploration involving optimisation of workload distribution for systems with accelerators such as FPGAs and GPUs. We demonstrate that, with minor modification to the design, it is possible to estimate performance and power efficiency trade off to identify optimized workload distribution. Our approach shows that, for N-body computation, the fastest design which involves 2 CPU cores, 10 FPGA cores and 40960 GPU threads, is 2 times faster than a design with only FPGAs while achieving better overall energy efficiency.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Processing speed and energy efficiency are two of the most critical issues for computer systems. This paper presents a systematic approach for profiling the power and performance characteristics of application targeting heterogeneous multi-core computing platforms. Our approach enables rapid and automated design space exploration involving optimisation of workload distribution for systems with accelerators such as [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[1782,14,533,344,377,452,258,20,298,199],"class_list":["post-6968","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-computer-science","tag-cuda","tag-design-space-exploration","tag-energy-efficient-computing","tag-fpga","tag-heterogeneous-systems","tag-n-body-simulation","tag-nvidia","tag-optimization","tag-tesla-c1060"],"views":1947,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/6968","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6968"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/6968\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6968"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6968"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}