https://hgpu.org/?p=19136
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism