https://hgpu.org/?p=8325
An Efficient GPU Implementation of Modified Discrete Cosine Transform Using CUDA