{"id":26931,"date":"2022-06-19T16:15:54","date_gmt":"2022-06-19T13:15:54","guid":{"rendered":"https:\/\/hgpu.org\/?p=26931"},"modified":"2022-06-19T16:15:54","modified_gmt":"2022-06-19T13:15:54","slug":"cupbop-cuda-for-parallelized-and-broad-range-processors","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=26931","title":{"rendered":"CuPBoP: CUDA for Parallelized and Broad-range Processors"},"content":{"rendered":"<p>CUDA is one of the most popular choices for GPU programming, but it can only be executed on NVIDIA GPUs. Executing CUDA on non-NVIDIA devices not only benefits the hardware community, but also allows data-parallel computation in heterogeneous systems. To make CUDA programs portable, some researchers have proposed using source-to-source translators to translate CUDA to portable programming languages that can be executed on non-NVIDIA devices. However, most CUDA translators require additional manual modifications on the translated code, which imposes a heavy workload on developers. In this paper, CuPBoP is proposed to execute CUDA on non-NVIDIA devices without relying on any portable programming languages. Compared with existing work that executes CUDA on non-NVIDIA devices, CuPBoP does not require manual modification of the CUDA source code, but it still achieves the highest coverage (69.6%), much higher than existing frameworks (56.6%) on the Rodinia benchmark. In particular, for CPU backends, CuPBoP supports several ISAs (e.g., X86, RISC-V, AArch64) and has close or even higher performance compared with other projects. We also compare and analyze the performance among CuPBoP, manually optimized OpenMP\/MPI programs, and CUDA programs on the latest Ampere architecture GPU, and show future directions for supporting CUDA programs on non-NVIDIA devices with high performance<\/p>\n","protected":false},"excerpt":{"rendered":"<p>CUDA is one of the most popular choices for GPU programming, but it can only be executed on NVIDIA GPUs. Executing CUDA on non-NVIDIA devices not only benefits the hardware community, but also allows data-parallel computation in heterogeneous systems. To make CUDA programs portable, some researchers have proposed using source-to-source translators to translate CUDA to [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[11,89,3],"tags":[451,955,1782,14,452,2063,242,20,2073,252],"class_list":["post-26931","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-benchmarking","tag-compilers","tag-computer-science","tag-cuda","tag-heterogeneous-systems","tag-hip","tag-mpi","tag-nvidia","tag-nvidia-geforce-gtx-1660-ti","tag-openmp"],"views":1439,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/26931","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=26931"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/26931\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=26931"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=26931"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=26931"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}