{"id":24776,"date":"2021-03-28T12:24:20","date_gmt":"2021-03-28T09:24:20","guid":{"rendered":"https:\/\/hgpu.org\/?p=24776"},"modified":"2021-03-28T12:24:20","modified_gmt":"2021-03-28T09:24:20","slug":"de-specializing-an-hls-library-for-deep-neural-networks-improvements-upon-hls4ml","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=24776","title":{"rendered":"De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml"},"content":{"rendered":"<p>Custom hardware accelerators for Deep Neural Networks are increasingly popular: in fact, the flexibility and performance offered by FPGAs are well-suited to the computational effort and low latency constraints required by many image recognition and natural language processing tasks. The gap between high-level Machine Learning frameworks (e.g., Tensorflow, Pytorch) and low-level hardware design in Verilog\/VHDL creates a barrier to widespread adoption of FPGAs, which can be overcome with the help of High-Level Synthesis. hls4ml is a framework that translates Deep Neural Networks into annotated C++ code for High-Level Synthesis, offering a complete and user-friendly design process that has been enthusiastically adopted in physics research. We analyze the strengths and weaknesses of hls4ml, drafting a plan to enhance its core library of components in order to allow more advanced optimizations, target a wider selection of FPGAs, and support larger Neural Network models.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Custom hardware accelerators for Deep Neural Networks are increasingly popular: in fact, the flexibility and performance offered by FPGAs are well-suited to the computational effort and low latency constraints required by many image recognition and natural language processing tasks. The gap between high-level Machine Learning frameworks (e.g., Tensorflow, Pytorch) and low-level hardware design in Verilog\/VHDL [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,3,12],"tags":[1782,377,633,901,1025,34,1783],"class_list":["post-24776","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-paper","category-physics","tag-computer-science","tag-fpga","tag-hardware-architecture","tag-image-recognition","tag-machine-learning","tag-neural-networks","tag-physics"],"views":1904,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/24776","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24776"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/24776\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24776"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24776"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24776"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}