Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture

Christian Knauth, Boran Adas, Daniel Whitfield, Xuesong Wang, Lydia Ickler, Tim Conrad, Oliver Serang
Freie Universitat Berlin, Institut fur Informatik
arXiv:1708.01873 [cs.MS], (2 Aug 2017)


   title={Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture},

   author={Knauth, Christian and Adas, Boran and Whitfield, Daniel and Wang, Xuesong and Ickler, Lydia and Conrad, Tim and Serang, Oliver},






Download Download (PDF)   View View   Source Source   Source codes Source codes



The bit-reversed permutation is a famous task in signal processing and is key to efficient implementation of the fast Fourier transform. This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes, local pairwise swapping of bits, and swapping via a cache-localized matrix buffer. Three new strategies for performing the bit-reversed permutation in C++11 are proposed: an inductive method using the bitwise XOR operation, a template-recursive closed form, and a cache-oblivious template-recursive approach, which reduces the bit-reversed permutation to smaller bit-reversed permutations and a square matrix transposition. These new methods are compared to the extant approaches in terms of theoretical runtime, empirical compile time, and empirical runtime. The template-recursive cache-oblivious method is shown to be competitive with the fastest known method; however, we demonstrate that the cache-oblivious method can more readily benefit from parallelization on multiple cores and on the GPU.
Rating: 3.7/5. From 3 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: