AILGAug 24, 2025

MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment

arXiv:2508.17290v13 citationsh-index: 21
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited multilingual benchmarks for VLMs, specifically for Persian, though it is incremental as it extends existing dataset creation approaches to a new language.

The authors tackled the lack of evaluation resources for Persian vision-language models by introducing MEENA, a dataset with approximately 7,500 Persian and 3,000 English questions across educational topics, enabling assessment of capabilities like reasoning and image attention.

Recent advancements in large vision-language models (VLMs) have primarily focused on English, with limited attention given to other languages. To address this gap, we introduce MEENA (also known as PersianMMMU), the first dataset designed to evaluate Persian VLMs across scientific, reasoning, and human-level understanding tasks. Our dataset comprises approximately 7,500 Persian and 3,000 English questions, covering a wide range of topics such as reasoning, mathematics, physics, diagrams, charts, and Persian art and literature. Key features of MEENA include: (1) diverse subject coverage spanning various educational levels, from primary to upper secondary school, (2) rich metadata, including difficulty levels and descriptive answers, (3) original Persian data that preserves cultural nuances, (4) a bilingual structure to assess cross-linguistic performance, and (5) a series of diverse experiments assessing various capabilities, including overall performance, the model's ability to attend to images, and its tendency to generate hallucinations. We hope this benchmark contributes to enhancing VLM capabilities beyond English.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes