{"id":7983,"date":"2012-07-28T20:51:28","date_gmt":"2012-07-28T17:51:28","guid":{"rendered":"http:\/\/hgpu.org\/?p=7983"},"modified":"2012-07-28T20:51:28","modified_gmt":"2012-07-28T17:51:28","slug":"fast-linear-algebra-on-gpu","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=7983","title":{"rendered":"Fast Linear Algebra on GPU"},"content":{"rendered":"<p>GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is the minimal size of primitives being handled, in order to achieve a significant speedup compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching operations in order to  have sufficient amount of data to perform the calculation maximally efficiently on the GPU. A fast OpenCL implementation of two basic vector functions &#8211; vector reduction and vector scaling &#8211; is described in this paper. Its performance is analyzed by running benchmarks on two of the most common GPUs in use &#8211; Tesla and Fermi GPUs from NVIDIA. Reported experimental results show that our implementation significantly outperforms the current state-of-the-art GPUbased basic linear algebra library CUBLAS.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is the minimal size of primitives being handled, in order to achieve a significant speedup compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[11,90,3],"tags":[1782,238,37,20,253,1092,1793],"class_list":["post-7983","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-opencl","category-paper","tag-computer-science","tag-cublas","tag-linear-algebra","tag-nvidia","tag-nvidia-geforce-gtx-260","tag-nvidia-geforce-gtx-590","tag-opencl"],"views":2505,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/7983","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7983"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/7983\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7983"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7983"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7983"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}