https://hgpu.org/?p=19253
Automatic Performance Optimisation of Parallel Programs for GPUs via Rewrite Rules