IR CLFeb 5

SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

Filip Kučera, Christoph Mandl, Isao Echizen, Radu Timofte, Timo Spinde

arXiv:2602.05413v12.3h-index: 98Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of gathering definitions from the increasing volume of academic publications, though it is incremental as it builds on existing LLM methods.

The authors tackled the problem of extracting definitions from scientific literature by introducing SciDef, an LLM-based pipeline, and demonstrated that it successfully extracted 86.4% of definitions from their test-set.

Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs' similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them. Code & datasets are available at https://github.com/Media-Bias-Group/SciDef.

View on arXiv PDF Code

Similar