{"id":8906,"date":"2013-02-09T22:59:33","date_gmt":"2013-02-09T20:59:33","guid":{"rendered":"http:\/\/hgpu.org\/?p=8906"},"modified":"2013-02-09T22:59:33","modified_gmt":"2013-02-09T20:59:33","slug":"enabling-inter-machine-parallelism-in-high-level-languages-with-sejits-and-mapreduce","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=8906","title":{"rendered":"Enabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce"},"content":{"rendered":"<p>Selective, embedded, just-in-time specialization (SEJITS) is a technique for optimizing embedded domain-specific languages through the use of specializers, or code modules developed by expert programmers that target particular accelerators such as multicore processors and GPUs via just-in-time compilation. We extend SEJITS to exploit inter-machine parallelism by targeting clusters of machines via MapReduce. Our work enables the development of specializers for large, data-parallel applications whose work flows can be cast as MapReduce operations. We present an implementation that targets Hadoop and we describe specializers for two applications. The first, a pure-Python protein docking application, requires a 1-line change to realize a 280x speedup on a cluster with 450 cores. The second, an audio processing application, demonstrates our approach&#8217;s ability to leverage clusters of GPU-equipped machines by composing parallel programming patterns. Results indicate that clusters are viable targets for specialization, and that pattern composition is a useful technique for managing multi-level parallelism.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Selective, embedded, just-in-time specialization (SEJITS) is a technique for optimizing embedded domain-specific languages through the use of specializers, or code modules developed by expert programmers that target particular accelerators such as multicore processors and GPUs via just-in-time compilation. We extend SEJITS to exploit inter-machine parallelism by targeting clusters of machines via MapReduce. Our work enables [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[1782,14,261,20,70,513,931],"class_list":["post-8906","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-computer-science","tag-cuda","tag-mapreduce","tag-nvidia","tag-programming-techniques","tag-python","tag-tesla-m2050"],"views":2525,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8906","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8906"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8906\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8906"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8906"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8906"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}