https://hgpu.org/?p=921
Techniques for efficient DCT/IDCT implementation on generic GPU