{"id":29806,"date":"2025-03-10T19:03:17","date_gmt":"2025-03-10T17:03:17","guid":{"rendered":"https:\/\/hgpu.org\/?p=29806"},"modified":"2025-03-10T19:03:17","modified_gmt":"2025-03-10T17:03:17","slug":"superman-efficient-permanent-computation-on-gpus","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=29806","title":{"rendered":"SUperman: Efficient Permanent Computation on GPUs"},"content":{"rendered":"<p>The permanent is a function, defined for a square matrix, with applications in various domains including quantum computing, statistical physics, complexity theory, combinatorics, and graph theory. Its formula is similar to that of the determinant, however unlike the determinant, its exact computation is #P-complete, i.e., there is no algorithm to compute the permanent in polynomial time unless P=NP. For an n\u00d7n matrix, the fastest algorithm has a time complexity of O(2n\u22121n). Although supercomputers have been employed for permanent computation before, there is no work and more importantly, no publicly available software that leverages cutting-edge, yet widely accessible, High-Performance Computing accelerators such as GPUs. In this work, we designed, developed, and investigated the performance of SUperman, a complete software suite that can compute matrix permanents on multiple nodes\/GPUs on a cluster while handling various matrix types, e.g., real\/complex\/binary and sparse\/dense etc., with a unique treatment for each type. Compared to a state-of-the-art parallel algorithm on 44 cores, SUperman can be 86\u00d7 faster on a single Nvidia A100 GPU. Combining multiple GPUs, we also showed that SUperman can compute the permanent of a 56\u00d756 matrix which is the largest reported in the literature.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The permanent is a function, defined for a square matrix, with applications in various domains including quantum computing, statistical physics, complexity theory, combinatorics, and graph theory. Its formula is similar to that of the determinant, however unlike the determinant, its exact computation is #P-complete, i.e., there is no algorithm to compute the permanent in polynomial [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[1782,14,1682,628,20,2066,2018,680,176],"class_list":["post-29806","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-computer-science","tag-cuda","tag-hpc","tag-numerical-analysis","tag-nvidia","tag-nvidia-a100","tag-nvidia-quadro-gv100","tag-openmpi","tag-package"],"views":1142,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/29806","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=29806"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/29806\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=29806"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=29806"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=29806"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}