https://hgpu.org/?p=5478
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators