SciDef: Automating Definition Extraction from Academic Literature with Large Language Models
National Institute of Informatics (NII), Tokyo, Japan
arXiv:2602.05413 [cs.IR], (5 Feb 2026)
@misc{kučera2026scidef,
title={SciDef: Automating Definition Extraction from Academic Literature with Large Language Models},
author={Filip Kučera and Christoph Mandl and Isao Echizen and Radu Timofte and Timo Spinde},
year={2026},
eprint={2602.05413},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2602.05413}
}
Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs’ similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them.
February 8, 2026 by hgpu
Your response
You must be logged in to post a comment.





