30539

SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

Filip Kučera, Christoph Mandl, Isao Echizen, Radu Timofte, Timo Spinde
National Institute of Informatics (NII), Tokyo, Japan
arXiv:2602.05413 [cs.IR], (5 Feb 2026)

@misc{kučera2026scidef,

   title={SciDef: Automating Definition Extraction from Academic Literature with Large Language Models},

   author={Filip Kučera and Christoph Mandl and Isao Echizen and Radu Timofte and Timo Spinde},

   year={2026},

   eprint={2602.05413},

   archivePrefix={arXiv},

   primaryClass={cs.IR},

   url={https://arxiv.org/abs/2602.05413}

}

Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs’ similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them.
No votes yet.
Please wait...

You must be logged in to post a comment.

Recent source codes

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Contact us: