https://hgpu.org/?p=2879
Automatically Tuned Dense Linear Algebra for Multicore+GPU