https://hgpu.org/?p=26890
Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors