CLJan 29

MasalBench: A Benchmark for Contextual and Cross-Cultural Understanding of Persian Proverbs in LLMs

arXiv:2601.22050v10.6h-index: 15Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for better cross-cultural understanding in LLMs for low-resource languages like Persian, though it is incremental as it extends existing benchmarking to a new domain.

The paper tackles the problem of evaluating multilingual LLMs' understanding of figurative language in low-resource languages, specifically Persian proverbs, and finds that while models achieve high accuracy (above 0.90) in identifying Persian proverbs in context, their performance drops to 0.79 accuracy when identifying equivalent English proverbs.

In recent years, multilingual Large Language Models (LLMs) have become an inseparable part of daily life, making it crucial for them to master the rules of conversational language in order to communicate effectively with users. While previous work has evaluated LLMs' understanding of figurative language in high-resource languages, their performance in low-resource languages remains underexplored. In this paper, we introduce MasalBench, a comprehensive benchmark for assessing LLMs' contextual and cross-cultural understanding of Persian proverbs, which are a key component of conversation in this low-resource language. We evaluate eight state-of-the-art LLMs on MasalBench and find that they perform well in identifying Persian proverbs in context, achieving accuracies above 0.90. However, their performance drops considerably when tasked with identifying equivalent English proverbs, with the best model achieving 0.79 accuracy. Our findings highlight the limitations of current LLMs in cultural knowledge and analogical reasoning, and they provide a framework for assessing cross-cultural understanding in other low-resource languages. MasalBench is available at https://github.com/kalhorghazal/MasalBench.

View on arXiv PDF Code

Similar