AICYMay 9

Mirror, Mirror on the Wall: Can VLM Agents Tell Who They Are at All?

arXiv:2605.0881667.6
AI Analysis

This work provides a diagnostic for evaluating grounded self-identification in embodied VLM agents, addressing a gap in assessing higher-order cognition beyond prompt compliance or confabulation.

The paper introduces a 3D benchmark to test whether embodied vision-language model (VLM) agents can recognize themselves in a mirror, finding that stronger VLMs exhibit mirror-based self-identification while weaker models fail to extract self-relevant information or misattribute their reflection.

In the animal kingdom, mirror self-recognition is a canonical probe of higher-order cognition, emerging only in some species. We ask whether an analogous functional capability emerges in embodied vision-language model (VLM) agents: can they recognize themselves in a mirror? We introduce a controlled 3D benchmark where a first-person VLM agent must infer a hidden body attribute from its reflection and select the matching target, while avoiding self-other misattribution. To separate mirror-grounded self-identification from shortcuts, we test mirror removal, misleading cues, and occluded reflections. We also evaluate the decision process through mirror seeking, temporal ordering, self-attribution, and reasoning-action consistency. Our experiments show that mirror-based self-identification emerges mainly in stronger VLMs. These models can use reflected evidence for action, whereas weaker models often inspect the mirror but fail to extract self-relevant information or misattribute their reflection. Language-vision conflict further shows that self-referential language alone is not evidence of grounded self-identification. Overall, mirror-based evaluation provides a diagnostic for whether embodied self-grounding is causally rooted in perception and action rather than priors, prompt compliance, or confabulation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes