{"id":7388,"date":"2012-04-04T22:27:18","date_gmt":"2012-04-04T19:27:18","guid":{"rendered":"http:\/\/hgpu.org\/?p=7388"},"modified":"2012-04-04T22:27:18","modified_gmt":"2012-04-04T19:27:18","slug":"novel-gpu-implementation-of-jacobi-algorithm-for-karhunen-loeve-transform-of-dense-matrices","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=7388","title":{"rendered":"Novel GPU Implementation of Jacobi Algorithm for Karhunen-Loeve Transform of Dense Matrices"},"content":{"rendered":"<p>Jacobi algorithm for Karhunen-Loeve transform of a symmetric real matrix, and its parallel implementation using chess tournament algorithm are revisited in this paper. Impact of memory access patterns and significance of memory coalescing on the performance of the GPU implementation for the parallel Jacobi algorithm are emphasized. Two novel memory access methods for the Jacobi algorithm are proposed. It is shown with simulation results that one of the proposed methods achieves 77.3% computational performance improvement over the traditional GPU methods, and it runs 73.5 times faster than CPU for a dense symmetric square matrix of size 1,024.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Jacobi algorithm for Karhunen-Loeve transform of a symmetric real matrix, and its parallel implementation using chess tournament algorithm are revisited in this paper. Impact of memory access patterns and significance of memory coalescing on the performance of the GPU implementation for the parallel Jacobi algorithm are emphasized. Two novel memory access methods for the Jacobi [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[36,11,89,3],"tags":[1787,1782,14,37,20,1006],"class_list":["post-7388","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-computer-science","category-nvidia-cuda","category-paper","tag-algorithms","tag-computer-science","tag-cuda","tag-linear-algebra","tag-nvidia","tag-tesla-c2070"],"views":2099,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/7388","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7388"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/7388\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7388"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7388"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7388"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}