https://hgpu.org/?p=17426
Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture