AIApr 16

MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror

arXiv:2604.1478580.8h-index: 33
Predicted impact top 40% in AI · last 90 daysOriginality Incremental advance
AI Analysis

This benchmark addresses the lack of systematic evaluation of self-centric intelligence in embodied MLLMs, providing a principled framework for assessing general intelligence in large models.

MirrorBench introduces a simulation-based benchmark inspired by the Mirror Self-Recognition test to evaluate self-centric intelligence in Multimodal Large Language Models (MLLMs). Experiments show that even at the lowest level, MLLMs perform substantially worse than humans, revealing fundamental limitations in self-referential understanding.

Recent progress in Multimodal Large Language Models (MLLMs) has demonstrated remarkable advances in perception and reasoning, suggesting their potential for embodied intelligence. While recent studies have evaluated embodied MLLMs in interactive settings, current benchmarks mainly target capabilities to perceive, understand, and interact with external objects, lacking a systematic evaluation of self-centric intelligence. To address this, we introduce MirrorBench, a simulation-based benchmark inspired by the classical Mirror Self-Recognition (MSR) test in psychology. MirrorBench extends this paradigm to embodied MLLMs through a tiered framework of progressively challenging tasks, assessing agents from basic visual perception to high-level self-representation. Experiments on leading MLLMs show that even at the lowest level, their performance remains substantially inferior to human performance, revealing fundamental limitations in self-referential understanding. Our study bridges psychological paradigms and embodied intelligence, offering a principled framework for evaluating the emergence of general intelligence in large models. Project page: https://fflahm.github.io/mirror-bench-page/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes