CLFeb 17, 2025

VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

arXiv:2502.11874v33 citationsh-index: 19ACL
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of grounding vague language in AI for applications in human-computer interaction, though it is incremental as it focuses on benchmarking existing models.

The paper tackled the problem of evaluating whether vision-and-language models align with humans in using vague quantifiers like 'a few' in visual contexts, finding that models are influenced by object counts but show inconsistencies across different evaluation methods.

Vague quantifiers such as "a few" and "many" are influenced by various contextual factors, including the number of objects present in a given context. In this work, we evaluate the extent to which vision-and-language models (VLMs) are compatible with humans when producing or judging the appropriateness of vague quantifiers in visual contexts. We release a novel dataset, VAQUUM, containing 20,300 human ratings on quantified statements across a total of 1089 images. Using this dataset, we compare human judgments and VLM predictions using three different evaluation methods. Our findings show that VLMs, like humans, are influenced by object counts in vague quantifier use. However, we find significant inconsistencies across models in different evaluation settings, suggesting that judging and producing vague quantifiers rely on two different processes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes