MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models
Provides a benchmark for testing a specific cognitive bias in VLMs, which is incremental for understanding model limitations.
MEBench evaluates mutual exclusivity bias in vision-language models, finding that they exhibit weak bias but can use spatial context to resolve ambiguity.
This paper introduces MEBench, a novel benchmark for evaluating mutual exclusivity (ME) bias, a cognitive phenomenon observed in children during word learning. Unlike traditional ME tasks, MEBench further incorporates spatial reasoning to create more challenging and realistic evaluation settings. To facilitate controlled experimentation, we also present a flexible and scalable data generation pipeline that supports the construction of diverse annotated scenes. We assess the performance of various vision-language models (VLMs) on this benchmark using novel evaluation metrics that capture key aspects of ME-based reasoning. We find that these VLMs exhibit weak ME bias, while showing some ability to leverage extra spatial context to resolve ambiguity in multiple novel object settings. Project page: http://mebench.github.io/.