https://hgpu.org/?p=17256
Implementing Efficient, Portable Computations for Machine Learning