{"id":11364,"date":"2014-02-08T02:46:54","date_gmt":"2014-02-08T00:46:54","guid":{"rendered":"http:\/\/hgpu.org\/?p=11364"},"modified":"2014-02-08T02:46:54","modified_gmt":"2014-02-08T00:46:54","slug":"fast-2-d-ultrasound-strain-imaging-the-benefits-of-using-a-gpu","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=11364","title":{"rendered":"Fast 2-D Ultrasound Strain Imaging: The Benefits of Using a GPU"},"content":{"rendered":"<p>Deformation of tissue can be accurately estimated from radio-frequency ultrasound data using a 2-dimensional normalized cross correlation (NCC)-based algorithm. This procedure, however, is very computationally time-consuming. A major time reduction can be achieved by parallelizing the numerous computations of NCC. In this paper, two approaches for parallelization have been investigated: the OpenMP interface on a multi-CPU system and Compute Unified Device Architecture (CUDA) on a graphics processing unit (GPU). The performance of the OpenMP and GPU approaches were compared with a conventional Matlab implementation of NCC. The OpenMP approach with 8 threads achieved a maximum speed-up factor of 132 on the computing of NCC, whereas the GPU approach on an Nvidia Tesla K20 achieved a maximum speed-up factor of 376. Neither parallelization approach resulted in a significant loss in image quality of the elastograms. Parallelization of the NCC computations using the GPU, therefore, significantly reduces the computation time and increases the frame rate for motion estimation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deformation of tissue can be accurately estimated from radio-frequency ultrasound data using a 2-dimensional normalized cross correlation (NCC)-based algorithm. This procedure, however, is very computationally time-consuming. A major time reduction can be achieved by parallelizing the numerous computations of NCC. In this paper, two approaches for parallelization have been investigated: the OpenMP interface on a [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[36,89,33,3],"tags":[1787,14,1786,20,252,176,1390,208],"class_list":["post-11364","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-nvidia-cuda","category-image-processing","category-paper","tag-algorithms","tag-cuda","tag-image-processing","tag-nvidia","tag-openmp","tag-package","tag-tesla-k20","tag-ultrasound"],"views":3623,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/11364","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=11364"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/11364\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=11364"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=11364"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=11364"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}