BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages
Novoic Ltd
arXiv:2005.10219 [cs.CL], (20 May 2020)
@misc{shivkumar2020blabla,
title={BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages},
author={Abhishek Shivkumar and Jack Weston and Raphael Lenain and Emil Fristed},
year={2020},
eprint={2005.10219},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
We introduce BlaBla, an open-source Python library for extracting linguistic features with proven clinical relevance to neurological and psychiatric diseases across many languages. BlaBla is a unifying framework for accelerating and simplifying clinical linguistic research. The library is built on state-of-the-art NLP frameworks and supports multithreaded/GPU-enabled feature extraction via both native Python calls and a command line interface. We describe BlaBla’s architecture and clinical validation of its features across 12 diseases. We further demonstrate the application of BlaBla to a task visualizing and classifying language disorders in three languages on real clinical data from the AphasiaBank dataset. We make the codebase freely available to researchers with the hope of providing a consistent, well-validated foundation for the next generation of clinical linguistic research.
May 24, 2020 by hgpu