{"id":18600,"date":"2018-11-11T17:59:03","date_gmt":"2018-11-11T15:59:03","guid":{"rendered":"https:\/\/hgpu.org\/?p=18600"},"modified":"2018-11-11T17:59:03","modified_gmt":"2018-11-11T15:59:03","slug":"a-hybrid-gpu-fpga-based-computing-platform-for-machine-learning","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=18600","title":{"rendered":"A Hybrid GPU-FPGA-based Computing Platform for Machine Learning"},"content":{"rendered":"<p>We present a hybrid GPU-FPGA based computing platform to tackle the high-density computing problem of machine learning. In our platform, the training part of a machine learning application is implemented on GPU and the inferencing part is implemented on FPGA. It should also include a model transplantation part which can transplant the model from the training part to the inferencing part. For evaluating this design methodology, we selected the LeNet-5 as our benchmark algorithm. During the training phase, GPU TitanXp&#8217;s speed was about 8.8x faster than CPU E-1620 and in the inferencing phase, FPGA Arria-10&#8217;s inferencing speed was fastest, 44.4x faster than CPU E-1620 and 6341x faster than GPU TitanXp. Moreover, by adopting our design methodology, we improved our LeNet-5 machine learning model&#8217;s accuracy from 99.05% to 99.13%, and successfully preserved the accuracy (99.13%) when transplanting the model from the GPU platform to the FPGA platform.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We present a hybrid GPU-FPGA based computing platform to tackle the high-density computing problem of machine learning. In our platform, the training part of a machine learning application is implemented on GPU and the inferencing part is implemented on FPGA. It should also include a model transplantation part which can transplant the model from the [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[36,11,90,3],"tags":[1787,451,330,1782,377,555,1025,20,1991,1793],"class_list":["post-18600","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-computer-science","category-opencl","category-paper","tag-algorithms","tag-benchmarking","tag-cnn","tag-computer-science","tag-fpga","tag-hybrid-computing","tag-machine-learning","tag-nvidia","tag-nvidia-geforce-gtx-titan-xp","tag-opencl"],"views":2655,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/18600","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=18600"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/18600\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=18600"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=18600"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=18600"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}