https://hgpu.org/?p=11977
Non-separable 2D, 3D and 4D filtering with CUDA