https://hgpu.org/?p=4887
High-Throughput Sequence Translation Using CUDA