https://hgpu.org/?p=24939
Easy and Efficient Transformer: Scalable Inference Solution For large NLP mode