https://hgpu.org/?p=24938
tcFFT: Accelerating Half-Precision FFT through Tensor Cores