https://hgpu.org/?p=24234
Efficient code generation for hardware accelerators by refining partially specified implementation