CLJun 30, 2025

IMPACT: Inflectional Morphology Probes Across Complex Typologies

arXiv:2506.23929v1
Originality Synthesis-oriented
AI Analysis

This work addresses a gap in LLM evaluation for non-English languages, which is important for researchers and developers working on multilingual AI, though it is incremental as it focuses on a specific linguistic aspect.

The authors tackled the problem of evaluating how well large language models (LLMs) understand linguistic complexity in morphologically rich languages by introducing IMPACT, a synthetic evaluation framework, and found that eight multilingual LLMs struggled with inflectional morphology, particularly on ungrammatical examples, with performance degrading in some cases.

Large Language Models (LLMs) have shown significant progress on various multilingual benchmarks and are increasingly used to generate and evaluate text in non-English languages. However, while they may produce fluent outputs, it remains unclear to what extent these models truly grasp the underlying linguistic complexity of those languages, particularly in morphology. To investigate this, we introduce IMPACT, a synthetically generated evaluation framework focused on inflectional morphology, which we publicly release, designed to evaluate LLM performance across five morphologically rich languages: Arabic, Russian, Finnish, Turkish, and Hebrew. IMPACT includes unit-test-style cases covering both shared and language-specific phenomena, from basic verb inflections (e.g., tense, number, gender) to unique features like Arabic's reverse gender agreement and vowel harmony in Finnish and Turkish. We assess eight multilingual LLMs that, despite strong English performance, struggle with other languages and uncommon morphological patterns, especially when judging ungrammatical examples. We also show that Chain of Thought and Thinking Models can degrade performance. Our work exposes gaps in LLMs' handling of linguistic complexity, pointing to clear room for improvement. To support further research, we publicly release the IMPACT framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes