{"id":12030,"date":"2014-05-09T06:22:06","date_gmt":"2014-05-09T03:22:06","guid":{"rendered":"http:\/\/hgpu.org\/?p=12030"},"modified":"2014-05-09T06:22:06","modified_gmt":"2014-05-09T03:22:06","slug":"automatic-scheduling-of-compute-kernels-across-heterogeneous-architectures","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=12030","title":{"rendered":"Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures"},"content":{"rendered":"<p>The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types of applications &#8211; traditional &quot;big&quot; CPUs (like the Intel Xeon or AMD Opteron) are optimized for low latency while other architectures (such as the NVidia Tesla K20x or Intel Xeon Phi) are optimized for high-throughput. These architectures have different tradeoffs and different performance profiles, meaning fantastic performance gains for the right types of applications. However applications that are ill-suited for a given architecture may experience significant slowdown; therefore, it is imperative that applications are scheduled onto the correct processor. In order to perform this scheduling, applications must be analyzed to determine their execution characteristics (e.g. an application that contains a lot of branching may be better suited to a traditional CPU). Traditionally this application-to-hardware mapping was determined statically by the programmer. However, this requires intimate knowledge of the application and underlying architecture, and precludes load-balancing by the system. We demonstrate and empirically evaluate a system for automatically scheduling compute kernels by extracting program characteristics and applying machine learning techniques. We develop a machine learning process that is system-agnostic, and works for a variety of contexts (e.g. embedded, desktop\/workstation, server). Finally, we perform scheduling in a workload-aware and workload-adaptive manner for these compute kernels.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,90,3],"tags":[955,1782,452,1483,1025,20,1793,252,854,1226],"class_list":["post-12030","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-opencl","category-paper","tag-compilers","tag-computer-science","tag-heterogeneous-systems","tag-intel-xeon-phi","tag-machine-learning","tag-nvidia","tag-opencl","tag-openmp","tag-task-scheduling","tag-tesla-c2075"],"views":1943,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/12030","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12030"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/12030\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12030"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12030"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12030"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}