{"id":17426,"date":"2017-08-08T09:11:14","date_gmt":"2017-08-08T06:11:14","guid":{"rendered":"https:\/\/hgpu.org\/?p=17426"},"modified":"2017-08-08T09:11:14","modified_gmt":"2017-08-08T06:11:14","slug":"practically-efficient-methods-for-performing-bit-reversed-permutation-in-c11-on-the-x86-64-architecture","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=17426","title":{"rendered":"Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture"},"content":{"rendered":"<p>The bit-reversed permutation is a famous task in signal processing and is key to efficient implementation of the fast Fourier transform. This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes, local pairwise swapping of bits, and swapping via a cache-localized matrix buffer. Three new strategies for performing the bit-reversed permutation in C++11 are proposed: an inductive method using the bitwise XOR operation, a template-recursive closed form, and a cache-oblivious template-recursive approach, which reduces the bit-reversed permutation to smaller bit-reversed permutations and a square matrix transposition. These new methods are compared to the extant approaches in terms of theoretical runtime, empirical compile time, and empirical runtime. The template-recursive cache-oblivious method is shown to be competitive with the fastest known method; however, we demonstrate that the cache-oblivious method can more readily benefit from parallelization on multiple cores and on the GPU.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The bit-reversed permutation is a famous task in signal processing and is key to efficient implementation of the fast Fourier transform. This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes, local pairwise swapping of bits, and swapping [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3,41],"tags":[1782,14,597,20,252,176,1789],"class_list":["post-17426","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","category-signal-processing","tag-computer-science","tag-cuda","tag-mathematical-software","tag-nvidia","tag-openmp","tag-package","tag-signal-processing"],"views":2412,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/17426","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=17426"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/17426\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=17426"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=17426"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=17426"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}