{"id":24200,"date":"2020-12-06T14:52:39","date_gmt":"2020-12-06T12:52:39","guid":{"rendered":"https:\/\/hgpu.org\/?p=24200"},"modified":"2020-12-06T14:52:39","modified_gmt":"2020-12-06T12:52:39","slug":"exploring-fpga-optimizations-to-compute-sparse-numerical-linear-algebra-kernels","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=24200","title":{"rendered":"Exploring FPGA Optimizations to Compute Sparse Numerical Linear Algebra Kernels"},"content":{"rendered":"<p>The solution of sparse triangular linear systems (sptrsv) is the bottleneck of many numerical methods. Thus, it is crucial to count with efficient implementations of such kernel, at least for commonly used platforms. In this sense, Field\u2013Programmable Gate Arrays (FPGAs) have evolved greatly in the last years, entering the HPC hardware ecosystem largely due to their superior energy\u2013efficiency relative to more established accelerators. Up until recently, the design for FPGAs implied the use of low\u2013level Hardware Description Languages (HDL) such as VHDL or Verilog. Nowadays, manufacturers are making a large effort to adopt High\u2013Level Synthesis languages like C\/C++ or OpenCL, but the gap between their performance and that of HDLs is not yet fully studied. This work focuses on the performance offered by FPGAs to compute the sptrsv using OpenCL. For this purpose, we implement different parallel variants of this kernel and experimentally evaluate several setups, varying among others the work\u2013group size, the number of compute units, the unroll\u2013factor and the vectorization\u2013factor.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The solution of sparse triangular linear systems (sptrsv) is the bottleneck of many numerical methods. Thus, it is crucial to count with efficient implementations of such kernel, at least for commonly used platforms. In this sense, Field\u2013Programmable Gate Arrays (FPGAs) have evolved greatly in the last years, entering the HPC hardware ecosystem largely due to [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,90,3],"tags":[1782,377,37,1793,298,67,2012],"class_list":["post-24200","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-opencl","category-paper","tag-computer-science","tag-fpga","tag-linear-algebra","tag-opencl","tag-optimization","tag-performance","tag-sparse"],"views":2130,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/24200","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24200"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/24200\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24200"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24200"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24200"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}