https://hgpu.org/?p=28301
Communication-minimizing Asynchronous Tensor Parallelism