CVCLSep 21, 2025

VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery

arXiv:2509.17191v14 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of equipping MLLMs with domain expertise for cultural-heritage artifact analysis, providing a reusable resource for the AI and archaeology communities, though it is incremental as it builds on existing SFT and RL techniques.

The researchers tackled the problem of enabling multimodal large language models to perform robust, expert-level reasoning for ancient Greek pottery analysis by developing VaseVL, an SFT-then-RL system that uses a taxonomy of question types to identify and optimize performance gaps. Their approach achieved state-of-the-art results on style classification and historical attribution with marked gains in compositional robustness over SFT-only baselines, and they released the VaseVQA benchmark with 31,773 images for future research.

Analyzing cultural-heritage artifacts remains challenging for MLLMs: general models lack domain expertise, and SFT often overfits superficial patterns, yielding brittle reasoning for authentication and historical attribution. This raises the question of how to equip MLLMs with robust, expert-level reasoning for ancient Greek pottery. We present VaseVL, an SFT-then-RL system that turns evaluation into supervision: we construct a taxonomy of question types, probe the SFT model to localize type-specific performance gaps, and optimize with type-conditioned, compositionality-oriented rewards targeting those gaps. We also release VaseVQA, a comprehensive benchmark of 31,773 images designed to probe deep understanding. Experiments show state-of-the-art results on style classification and historical attribution with marked gains in compositional robustness over SFT-only baselines, validating diagnosis-guided, taxonomy-conditioned reward engineering and providing a reusable resource for future research. Code and dataset will be available at https://github.com/AIGeeksGroup/VaseVQA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes