CV AIMay 15, 2025

MIRAGE: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence

Chonghan Liu, Haoran Wang, Felix Henry, Pu Miao, Yajie Zhang, Yu Zhao, Peiran Wu

arXiv:2505.10604v210.23 citationsh-index: 5

Originality Synthesis-oriented

AI Analysis

This addresses the need for improved representations and reasoning frameworks in computer vision, providing a pathway toward spatiotemporal reasoning, but it is incremental as it builds on existing benchmark efforts.

The paper tackles the problem of gaps in models' abilities for spatial perception and reasoning by proposing MIRAGE, a multi-modal benchmark that evaluates capabilities in counting, relation, and counting with relation, revealing critical limitations in state-of-the-art models.

Spatial perception and reasoning are core components of human cognition, encompassing object recognition, spatial relational understanding, and dynamic reasoning. Despite progress in computer vision, existing benchmarks reveal significant gaps in models' abilities to accurately recognize object attributes and reason about spatial relationships, both essential for dynamic reasoning. To address these limitations, we propose MIRAGE, a multi-modal benchmark designed to evaluate models' capabilities in Counting (object attribute recognition), Relation (spatial relational reasoning), and Counting with Relation. Through diverse and complex scenarios requiring fine-grained recognition and reasoning, MIRAGE highlights critical limitations in state-of-the-art models, underscoring the need for improved representations and reasoning frameworks. By targeting these foundational abilities, MIRAGE provides a pathway toward spatiotemporal reasoning in future research.

View on arXiv PDF

Similar