https://hgpu.org/?p=18404
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours