https://hgpu.org/?p=4404
Accelerating batched 1D-FFT with a CUDA-capable computer