CLJul 8, 2025

Bridging Perception and Language: A Systematic Benchmark for LVLMs' Understanding of Amodal Completion Reports

arXiv:2507.05799v1h-index: 12CogSci
Originality Incremental advance
AI Analysis

This work addresses a gap in assessing LVLMs' inferential abilities for multimodal tasks involving perceptual descriptions, though it is incremental as it focuses on a specific benchmark.

The paper tackled the problem of evaluating large vision-language models' (LVLMs) understanding of amodal completion in texts, finding that while many LVLMs achieve human-comparable performance overall, their accuracy varies for certain object types and some models show lower accuracy on original images than blank stimuli under Japanese prompting.

One of the main objectives in developing large vision-language models (LVLMs) is to engineer systems that can assist humans with multimodal tasks, including interpreting descriptions of perceptual experiences. A central phenomenon in this context is amodal completion, in which people perceive objects even when parts of those objects are hidden. Although numerous studies have assessed whether computer-vision algorithms can detect or reconstruct occluded regions, the inferential abilities of LVLMs on texts related to amodal completion remain unexplored. To address this gap, we constructed a benchmark grounded in Basic Formal Ontology to achieve a systematic classification of amodal completion. Our results indicate that while many LVLMs achieve human-comparable performance overall, their accuracy diverges for certain types of objects being completed. Notably, in certain categories, some LLaVA-NeXT variants and Claude 3.5 Sonnet exhibit lower accuracy on original images compared to blank stimuli lacking visual content. Intriguingly, this disparity emerges only under Japanese prompting, suggesting a deficiency in Japanese-specific linguistic competence among these models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes