https://hgpu.org/?p=12440
Two-way partitioning of a recursive Gaussian filter in CUDA