CLSep 20, 2025

Robust Native Language Identification through Agentic Decomposition

Ahmet Yavuz Uluslu, Tannon Kew, Tilia Ellendorff, Gerold Schneider, Rico Sennrich

arXiv:2509.16666v16.72 citationsh-index: 7EMNLP

Originality Incremental advance

AI Analysis

This work addresses robustness in native language identification for forensic linguistics and language analysis, though it is incremental as it builds on existing methods with a novel decomposition approach.

The paper tackled the problem of large language models relying on superficial contextual clues rather than linguistic patterns for native language identification, and introduced an agentic pipeline that significantly enhanced robustness and consistency against misleading hints on benchmark datasets.

Large language models (LLMs) often achieve high performance in native language identification (NLI) benchmarks by leveraging superficial contextual clues such as names, locations, and cultural stereotypes, rather than the underlying linguistic patterns indicative of native language (L1) influence. To improve robustness, previous work has instructed LLMs to disregard such clues. In this work, we demonstrate that such a strategy is unreliable and model predictions can be easily altered by misleading hints. To address this problem, we introduce an agentic NLI pipeline inspired by forensic linguistics, where specialized agents accumulate and categorize diverse linguistic evidence before an independent final overall assessment. In this final assessment, a goal-aware coordinating agent synthesizes all evidence to make the NLI prediction. On two benchmark datasets, our approach significantly enhances NLI robustness against misleading contextual clues and performance consistency compared to standard prompting methods.

View on arXiv PDF

Similar