CLFeb 27, 2025

XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs

Linyang He, Ercong Nie, Sukru Samet Dindar, Arsalan Firoozi, Adrian Florea, Van Nguyen, Corentin Puffay, Riki Shimizu, Haotian Ye, Jonathan Brennan, Helmut Schmid, Hinrich Schütze

arXiv:2502.19737v19.64 citationsh-index: 45Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

Originality Synthesis-oriented

AI Analysis

This work provides a new benchmark for evaluating multilingual conceptual understanding in LLMs, which is incremental as it extends existing evaluation methods to more languages.

The authors introduced XCOMPS, a multilingual conceptual minimal pair dataset covering 17 languages, to evaluate LLMs' multilingual conceptual understanding, finding that LLMs show weaker performance for low-resource languages and struggle with subtle semantic similarities.

We introduce XCOMPS in this work, a multilingual conceptual minimal pair dataset covering 17 languages. Using this dataset, we evaluate LLMs' multilingual conceptual understanding through metalinguistic prompting, direct probability measurement, and neurolinguistic probing. By comparing base, instruction-tuned, and knowledge-distilled models, we find that: 1) LLMs exhibit weaker conceptual understanding for low-resource languages, and accuracy varies across languages despite being tested on the same concept sets. 2) LLMs excel at distinguishing concept-property pairs that are visibly different but exhibit a marked performance drop when negative pairs share subtle semantic similarities. 3) Instruction tuning improves performance in concept understanding but does not enhance internal competence; knowledge distillation can enhance internal competence in conceptual understanding for low-resource languages with limited gains in explicit task performance. 4) More morphologically complex languages yield lower concept understanding scores and require deeper layers for conceptual reasoning.

View on arXiv PDF

Similar