{"id":5698,"date":"2011-09-26T16:10:12","date_gmt":"2011-09-26T13:10:12","guid":{"rendered":"http:\/\/hgpu.org\/?p=5698"},"modified":"2011-09-26T16:10:12","modified_gmt":"2011-09-26T13:10:12","slug":"generating-gpu-code-from-a-high-level-representation-for-image-processing-kernels","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=5698","title":{"rendered":"Generating GPU Code from a High-level Representation for Image Processing Kernels"},"content":{"rendered":"<p>We present a framework for representing image processing kernels based on decoupled access\/execute metadata, which allow the programmer to specify both execution constraints and memory access pattern of a kernel. The framework performs source-to-source translation of kernels expressed in highlevel framework-specific C++ classes into low-level CUDA or OpenCL code with effective device-dependent optimizations such as global memory padding for memory coalescing and optimal memory bandwidth utilization. We evaluate the framework on several image filters, comparing generated code against highlyoptimized CPU and GPU versions in the popular OpenCV library.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We present a framework for representing image processing kernels based on decoupled access\/execute metadata, which allow the programmer to specify both execution constraints and memory access pattern of a kernel. The framework performs source-to-source translation of kernels expressed in highlevel framework-specific C++ classes into low-level CUDA or OpenCL code with effective device-dependent optimizations such as [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[89,33,90,3],"tags":[7,455,215,955,14,1786,20,234,710,1793,298,378],"class_list":["post-5698","post","type-post","status-publish","format-standard","hentry","category-nvidia-cuda","category-image-processing","category-opencl","category-paper","tag-ati","tag-ati-radeon-hd-5870","tag-code-generation","tag-compilers","tag-cuda","tag-image-processing","tag-nvidia","tag-nvidia-geforce-gtx-280","tag-nvidia-quadro-fx-5800","tag-opencl","tag-optimization","tag-tesla-c2050"],"views":2383,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/5698","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5698"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/5698\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5698"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5698"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5698"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}