https://hgpu.org/?p=1889
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA