{"id":6124,"date":"2011-10-31T14:10:34","date_gmt":"2011-10-31T12:10:34","guid":{"rendered":"http:\/\/hgpu.org\/?p=6124"},"modified":"2011-10-31T14:10:34","modified_gmt":"2011-10-31T12:10:34","slug":"rapid-performance-of-a-generalized-distance-calculation","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=6124","title":{"rendered":"Rapid Performance of a Generalized Distance Calculation"},"content":{"rendered":"<p>The ever-increasing size of data sets and the need for real-time processing drives the need for high speed analysis. Since traditional CPUs are designed to execute a small number of sequential process, they are ill-suited to keep pace with this growth and exploit the massive parallelism inherent in these problem spaces. In the last several years, the parallelism of GPUs has made them a viable solution for general purpose computing. However, effective use  of GPUs requires a significantly different programming paradigm. Towards the goal of creating a function library that maximizes the performance improvement of GPUs in data analysis and clustering, this paper presents an implementation of a general n-dimensional distance calculation commonly used in these types of algorithms.  Experimental results show up to a 390x speedup using a Tesla C1060 and up to a 538x speedup using a GeForce GTX 480 over an Intel Core i7.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The ever-increasing size of data sets and the need for real-time processing drives the need for high speed analysis. Since traditional CPUs are designed to execute a small number of sequential process, they are ill-suited to keep pace with this growth and exploit the massive parallelism inherent in these problem spaces. In the last several [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[36,11,89,3],"tags":[1787,468,1782,14,20,379,199],"class_list":["post-6124","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-computer-science","category-nvidia-cuda","category-paper","tag-algorithms","tag-clustering","tag-computer-science","tag-cuda","tag-nvidia","tag-nvidia-geforce-gtx-480","tag-tesla-c1060"],"views":2072,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/6124","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6124"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/6124\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6124"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6124"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}