{"id":5493,"date":"2011-09-07T11:39:31","date_gmt":"2011-09-07T08:39:31","guid":{"rendered":"http:\/\/hgpu.org\/?p=5493"},"modified":"2011-09-07T11:39:31","modified_gmt":"2011-09-07T08:39:31","slug":"cuda-level-performance-with-python-level-productivity-for-gaussian-mixture-model-applications","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=5493","title":{"rendered":"CUDA-level performance with python-level productivity for Gaussian mixture model applications"},"content":{"rendered":"<p>Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware depending on the input data size and the hardware parameters. We show how to preserve the productivity of high-level languages while obtaining the performance of the best low-level language code variant for a given hardware platform and problem size using SEJITS, a set of techniques that leverages just-in-time code generation and compilation. As a case study, we demonstrate our technique for Gaussian Mixture Model training using the EM algorithm. With the addition of one line of code to import our framework, a domain programmer using an existing Python GMM library can run her program unmodified on a GPU-equipped computer and achieve performance that meets or beats GPU code hand-crafted by a human expert. We also show that despite the overhead of allowing the domain expert&#8217;s program to use Python and the overhead of just-in-time code generation and compilation, our approach still results in performance competitive with hand-crafted GPU code.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[36,11,89,90,3],"tags":[1787,215,1782,14,20,1793,513],"class_list":["post-5493","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-computer-science","category-nvidia-cuda","category-opencl","category-paper","tag-algorithms","tag-code-generation","tag-computer-science","tag-cuda","tag-nvidia","tag-opencl","tag-python"],"views":2204,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/5493","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5493"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/5493\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5493"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5493"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}