CLOct 24, 2025

Are the LLMs Capable of Maintaining at Least the Language Genus?

Sandra Mitrović, David Kletz, Ljiljana Dolamic, Fabio Rinaldi

arXiv:2510.21561v11 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This work addresses how LLMs handle genealogically related languages, which is important for multilingual AI applications, though it is incremental as it extends prior analyses.

The researchers investigated whether large language models (LLMs) exhibit sensitivity to linguistic genera by analyzing their multilingual behavior on the MultiQ dataset, finding that genus-level effects exist but are strongly influenced by training resource availability and vary across model families.

Large Language Models (LLMs) display notable variation in multilingual behavior, yet the role of genealogical language structure in shaping this variation remains underexplored. In this paper, we investigate whether LLMs exhibit sensitivity to linguistic genera by extending prior analyses on the MultiQ dataset. We first check if models prefer to switch to genealogically related languages when prompt language fidelity is not maintained. Next, we investigate whether knowledge consistency is better preserved within than across genera. We show that genus-level effects are present but strongly conditioned by training resource availability. We further observe distinct multilingual strategies across LLMs families. Our findings suggest that LLMs encode aspects of genus-level structure, but training data imbalances remain the primary factor shaping their multilingual performance.

View on arXiv PDF

Similar