CLNov 22, 2024

Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains

Yurii Paniv, Artur Kiulian, Dmytro Chaplynskyi, Mykola Khandoga, Anton Polishko, Tetiana Bas, Guillermo Gabrielli

arXiv:2411.14647v14.23 citationsh-index: 4Has CodeProceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of evaluating multimodal models for the Ukrainian language, which is incremental as it extends existing benchmarking approaches to a new linguistic context.

The authors tackled the lack of multimodal benchmarks for low-resource languages by introducing ZNO-Vision, a Ukrainian-centric benchmark with over 4,300 questions across 12 disciplines, and found that only a few models performed above baseline, with performance degradation in translated tasks.

While the evaluation of multimodal English-centric models is an active area of research with numerous benchmarks, there is a profound lack of benchmarks or evaluation suites for low- and mid-resource languages. We introduce ZNO-Vision, a comprehensive multimodal Ukrainian-centric benchmark derived from standardized university entrance examination (ZNO). The benchmark consists of over 4,300 expert-crafted questions spanning 12 academic disciplines, including mathematics, physics, chemistry, and humanities. We evaluated the performance of both open-source models and API providers, finding that only a handful of models performed above baseline. Alongside the new benchmark, we performed the first evaluation study of multimodal text generation for the Ukrainian language: we measured caption generation quality on the Multi30K-UK dataset, translated the VQA benchmark into Ukrainian, and measured performance degradation relative to original English versions. Lastly, we tested a few models from a cultural perspective on knowledge of national cuisine. We believe our work will advance multimodal generation capabilities for the Ukrainian language and our approach could be useful for other low-resource languages.

View on arXiv PDF

Similar