https://hgpu.org/?p=3656
Exploring scalability of FIR filter realizations on Graphics Processing Units