high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

Filip Kučera, Christoph Mandl, Isao Echizen, Radu Timofte, Timo Spinde

National Institute of Informatics (NII), Tokyo, Japan

arXiv:2602.05413 [cs.IR], (5 Feb 2026)

DOI:10.48550/arXiv.2602.05413

@misc{kučera2026scidef,

title={SciDef: Automating Definition Extraction from Academic Literature with Large Language Models},

author={Filip Kučera and Christoph Mandl and Isao Echizen and Radu Timofte and Timo Spinde},

year={2026},

eprint={2602.05413},

archivePrefix={arXiv},

primaryClass={cs.IR},

url={https://arxiv.org/abs/2602.05413}

}

Download (PDF)

View

Source

Source codes

Package:

SciDef: Automated Definition Extraction from Scientific Literature

578

views

Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs’ similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them.

Tags: Computer science, Data mining, LLM, NLP, Package

February 8, 2026 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

high performance computing on graphics processing units: hgpu.org

SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

Package:

Your response

Recent source codes

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

Most viewed papers (last 30 days)

SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)