CVApr 14

A Sanity Check on Composed Image Retrieval

arXiv:2604.1290431.7h-index: 22
Predicted impact top 14% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in composed image retrieval, this work provides more reliable evaluation tools that address critical flaws in current benchmarks and enable assessment of interactive capabilities.

The paper identifies that existing Composed Image Retrieval (CIR) benchmarks suffer from query ambiguity and lack multi-round evaluation. It introduces FISD, a new benchmark with controlled variables for accurate evaluation across six dimensions, and an automatic multi-round agentic evaluation framework, demonstrating their value through experiments.

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is not well characterized by existing benchmarks, which inherently contain indeterminate queries degrading the evaluation (i.e., multiple candidate images, rather than solely the target image, meet the query criteria), and have not considered their effectiveness in the context of the multi-round system. Motivated by this, we consider improving the evaluation procedure from two aspects: 1) we introduce FISD, a Fully-Informed Semantically-Diverse benchmark, which employs generative models to precisely control the variables of reference-target image pairs, enabling a more accurate evaluation of CIR methods across six dimensions, without query ambiguity; 2) we propose an automatic multi-round agentic evaluation framework to probe the potential of the existing models in the interactive scenarios. By observing how models adapt and refine their choices over successive rounds of queries, this framework provides a more realistic appraisal of their efficacy in practical applications. Extensive experiments and comparisons prove the value of our novel evaluation on typical CIR methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes