https://hgpu.org/?p=16028
Tensor Contractions with Extended BLAS Kernels on CPU and GPU