https://hgpu.org/?p=7174
A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators