CL IRJun 26, 2025

Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval

Yongchan Chun, Minhyuk Kim, Dongjun Kim, Chanjun Park, Heuiseok Lim

arXiv:2506.21222v14.93 citationsh-index: 13ACL

Originality Incremental advance

AI Analysis

This work addresses terminology extraction for NLP applications like machine translation, but it is incremental as it adapts existing LLM methods with a focus on syntactic cues.

The paper tackled the problem of automatic term extraction (ATE) by proposing a retrieval-based prompting strategy that uses syntactic similarity for few-shot demonstrations, resulting in improved F1-scores on three specialized benchmarks.

Automatic Term Extraction (ATE) identifies domain-specific expressions that are crucial for downstream tasks such as machine translation and information retrieval. Although large language models (LLMs) have significantly advanced various NLP tasks, their potential for ATE has scarcely been examined. We propose a retrieval-based prompting strategy that, in the few-shot setting, selects demonstrations according to \emph{syntactic} rather than semantic similarity. This syntactic retrieval method is domain-agnostic and provides more reliable guidance for capturing term boundaries. We evaluate the approach in both in-domain and cross-domain settings, analyzing how lexical overlap between the query sentence and its retrieved examples affects performance. Experiments on three specialized ATE benchmarks show that syntactic retrieval improves F1-score. These findings highlight the importance of syntactic cues when adapting LLMs to terminology-extraction tasks.

View on arXiv PDF

Similar