Biomedical and Clinical English Model Packages in the Stanza Python NLP Library
Stanford University, Stanford, CA 94305
arXiv:2007.14640 [cs.CL], (29 Jul 2020)
@misc{zhang2020biomedical,
title={Biomedical and Clinical English Model Packages in the Stanza Python NLP Library},
author={Yuhao Zhang and Yuhui Zhang and Peng Qi and Christopher D. Manning and Curtis P. Langlotz},
year={2020},
eprint={2007.14640},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
We introduce biomedical and clinical English model packages for the Stanza Python NLP library. These packages offer accurate syntactic analysis and named entity recognition capabilities for biomedical and clinical text, by combining Stanza’s fully neural architecture with a wide variety of open datasets as well as large-scale unsupervised biomedical and clinical text data. We show via extensive experiments that our packages achieve syntactic analysis and named entity recognition performance that is on par with or surpasses state-of-the-art results. We further show that these models do not compromise speed compared to existing toolkits when GPU acceleration is available, and are made easy to download and use with Stanza’s Python interface. A demonstration of our packages is available.
August 2, 2020 by hgpu