{"id":24044,"date":"2020-11-15T13:22:14","date_gmt":"2020-11-15T11:22:14","guid":{"rendered":"https:\/\/hgpu.org\/?p=24044"},"modified":"2020-11-15T13:22:14","modified_gmt":"2020-11-15T11:22:14","slug":"automatic-gpu-optimization-through-higher-order-functions-in-functional-languages","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=24044","title":{"rendered":"Automatic GPU optimization through higher-order functions in functional languages"},"content":{"rendered":"<p>Over recent years, graphics processing units (GPUs) have become popular devices to use in procedures that exhibit data-parallelism. Due to high parallel capability, running procedures on a GPU can result in an execution time speedup ranging from a couple times faster to several orders of magnitude faster, compared to executing serially on a central processing unit (CPU). Interfaces such as CUDA and OpenCL flexibly exposes the parallel capabilities of the GPU to the programmer, while at the same time putting a lot of responsibility on the programmer to handle aspects such as thread synchronization and memory management. A different approach to GPU optimization is to enable it through higher-order functions with known data-parallelism, using the semantics of the higher-order function to determine the parallel execution. This approach has in practice been integrated into existing languages through libraries or been integrated directly into languages themselves. However, higher-order functions do not address when it is beneficial to execute on a GPU. Due to the GPU being a separate device, effects such as latency and memory transfer can cause a slowdown for small input values. In this thesis, a set of commonly used higher-order functions are GPU enabled as compiler intrinsics in a small functional language. These higher-order functions are also equipped with the option of automatically deciding at runtime if to execute on GPU or CPU. Results show that running higher-order functions on GPU yields a speedup for larger computations. However, the performance does not match existing solutions that provide additional higher-order functions for optimizing the parallelization. The selected approach for automatically deciding whether to run a higher-order function on GPU or on CPU results in the faster option a majority of cases. Though the most notable benefit of automatic decisions was for procedures that use multiple higher-order function invocations, which ran faster compared to when executing only on GPU or only on CPU.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Over recent years, graphics processing units (GPUs) have become popular devices to use in procedures that exhibit data-parallelism. Due to high parallel capability, running procedures on a GPU can result in an execution time speedup ranging from a couple times faster to several orders of magnitude faster, compared to executing serially on a central processing [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[1782,14,20,2071,298,390],"class_list":["post-24044","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-computer-science","tag-cuda","tag-nvidia","tag-nvidia-quadro-p-2000","tag-optimization","tag-thesis"],"views":1972,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/24044","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24044"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/24044\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24044"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24044"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24044"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}