CL AIJul 17, 2025

Multilingual LLMs Are Not Multilingual Thinkers: Evidence from Hindi Analogy Evaluation

arXiv:2507.13238v24.91 citationsh-index: 2Proceedings of the 2nd Workshop on Analogical Abstraction in Cognition, Perception, and Language (Analogy-Angle II)

Originality Synthesis-oriented

AI Analysis

This work addresses the lack of resources to evaluate LLM reasoning in Hindi, which is incremental as it extends existing analogy benchmarks to a new language.

The authors tackled the understudied reasoning capabilities of multilingual LLMs in Indic languages by introducing a Hindi Analogy Test Set (HATS) with 405 questions, finding that models perform best with English prompts and that a grounded Chain of Thought approach improves performance on Hindi analogies.

Analogies test a model's ability to infer implicit relationships between concepts, making them a key benchmark for evaluating reasoning capabilities. While large language models (LLMs) are widely evaluated for reasoning in English, their abilities in Indic languages remain understudied, limiting our understanding of whether these models generalize across languages. To address this gap, we introduce a new Hindi Analogy Test Set (HATS), comprising 405 multiple-choice questions sourced from Indian government exams. We benchmark state-of-the-art multilingual LLMs using various prompting strategies and introduce a grounded Chain of Thought approach that leverages cognitive theories of analogical reasoning. This approach improves model performance on Hindi analogy questions. Our experiments show that models perform best with English prompts, irrespective of the prompting strategy. Our test set addresses the lack of a critical resource to evaluate LLM reasoning capabilities in Hindi.

View on arXiv PDF

Similar