{"id":29541,"date":"2024-11-17T17:53:28","date_gmt":"2024-11-17T15:53:28","guid":{"rendered":"https:\/\/hgpu.org\/?p=29541"},"modified":"2024-11-17T17:53:28","modified_gmt":"2024-11-17T15:53:28","slug":"kokkidio-fast-expressive-portable-code-based-on-kokkos-and-eigen","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=29541","title":{"rendered":"Kokkidio: Fast, expressive, portable code, based on Kokkos and Eigen"},"content":{"rendered":"<p>Kokkidio is a newly developed C++ template library that combines the performance portability framework Kokkos and its strength in utilising GPUs with the expressive syntax and CPU optimisations of the linear algebra library Eigen. Its unified abstractions enable both simple data management as well as clear, succinct compute code in kernel functors, where a novel iteration\/functor parameter performs target-specific grouping of operations. This preserves Eigen\u2019s loop abstractions, enabling explicit vectorisation of host code. A comprehensive evaluation across GPUs and CPUs by all major vendors shows Kokkidio providing significantly improved performance portability over Kokkos. With the fraction of best observed runtime as the efficiency metric, Kokkidio achieves a near-optimal harmonic mean of 0.95, compared to Kokkos\u2019 0.59 across all microbenchmarks and tested hardware, effectively merging Kokkos\u2019 GPU and Eigen\u2019s CPU performance into one framework.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Kokkidio is a newly developed C++ template library that combines the performance portability framework Kokkos and its strength in utilising GPUs with the expressive syntax and CPU optimisations of the linear algebra library Eigen. Its unified abstractions enable both simple data management as well as clear, succinct compute code in kernel functors, where a novel [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,3],"tags":[2087,7,1782,905,2133,37,20,2066,2035,176,1586],"class_list":["post-29541","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-paper","tag-amd-radeon-instinct-mi100","tag-ati","tag-computer-science","tag-intel","tag-intel-data-center-gpu-max-1550","tag-linear-algebra","tag-nvidia","tag-nvidia-a100","tag-nvidia-geforce-gtx-titan-v","tag-package","tag-performance-portability"],"views":1159,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/29541","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=29541"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/29541\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=29541"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=29541"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=29541"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}