https://hgpu.org/?p=20207
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators