{"id":5161,"date":"2011-08-18T18:46:03","date_gmt":"2011-08-18T15:46:03","guid":{"rendered":"http:\/\/hgpu.org\/?p=5161"},"modified":"2011-08-18T21:09:48","modified_gmt":"2011-08-18T18:09:48","slug":"optimizing-the-exploitation-of-multicore-processors-and-gpus-with-openmp-and-opencl","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=5161","title":{"rendered":"Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL"},"content":{"rendered":"<p>In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell\/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, and Julia Set. We compare the results obtained with the execution of the same benchmarks written in OpenCL, in the same architectures. The results show that OMPSs greatly outperforms the OpenCL environment. It is more flexible to exploit multiple accelerators. And due to the simplicity of the annotations, it increases programmer&#8217;s productivity.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell\/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[11,89,90,3],"tags":[545,1782,14,106,20,1793,252,70],"class_list":["post-5161","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-opencl","category-paper","tag-cell-processor","tag-computer-science","tag-cuda","tag-gpu-cluster","tag-nvidia","tag-opencl","tag-openmp","tag-programming-techniques"],"views":2095,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/5161","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5161"}],"version-history":[{"count":1,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/5161\/revisions"}],"predecessor-version":[{"id":5178,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/5161\/revisions\/5178"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5161"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5161"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5161"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}