https://hgpu.org/?p=18729
Supporting mixed-datatype matrix multiplication within the BLIS framework