{"id":3179,"date":"2011-03-12T13:11:47","date_gmt":"2011-03-12T13:11:47","guid":{"rendered":"http:\/\/hgpu.org\/?p=3179"},"modified":"2011-03-12T13:11:47","modified_gmt":"2011-03-12T13:11:47","slug":"gemm-on-a-gpu","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=3179","title":{"rendered":"GEMM on a GPU"},"content":{"rendered":"<p>The Matrix-Matrix Multiplication is the most important operation in High-Performance Linear Algebra. If your application can cast most of its computation in terms of the level-3 BLAS operations, the application can achieve very high-performance levels. For this reason the Basic Linear Algebra Subprograms(BLAS) tend to heavily optimize this operation. With Graphics Processing Units(GPUs) on the rise in the field of  highperformance computing, exposing the parallelism in this operation becomes increasingly more important.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Matrix-Matrix Multiplication is the most important operation in High-Performance Linear Algebra. If your application can cast most of its computation in terms of the level-3 BLAS operations, the application can achieve very high-performance levels. For this reason the Basic Linear Algebra Subprograms(BLAS) tend to heavily optimize this operation. With Graphics Processing Units(GPUs) on the [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[430,1782,14,37,20,710,958],"class_list":["post-3179","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-blas","tag-computer-science","tag-cuda","tag-linear-algebra","tag-nvidia","tag-nvidia-quadro-fx-5800","tag-poster"],"views":2804,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/3179","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3179"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/3179\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3179"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3179"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3179"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}