CVAIMar 31

MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network

arXiv:2603.2929181.019 citationsh-index: 6Has Code
Predicted impact top 27% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work improves retrieval accuracy for tasks requiring image modification based on text instructions, representing an incremental advancement in the field.

The paper tackles the problem of composed image retrieval by addressing frequency bias and similarity score interference, resulting in superior performance on two benchmarks.

Composed Image Retrieval (CIR) uses a reference image and a modification text as a query to retrieve a target image satisfying the requirement of ``modifying the reference image according to the text instructions''. However, existing CIR methods face two limitations: (1) frequency bias leading to ``Rare Sample Neglect'', and (2) susceptibility of similarity scores to interference from hard negative samples and noise. To address these limitations, we confront two key challenges: asymmetric rare semantic localization and robust similarity estimation under hard negative samples. To solve these challenges, we propose the Modification frEquentation-rarity baLance neTwork MELT. MELT assigns increased attention to rare modification semantics in multimodal contexts while applying diffusion-based denoising to hard negative samples with high similarity scores, enhancing multimodal fusion and matching. Extensive experiments on two CIR benchmarks validate the superior performance of MELT. Codes are available at https://github.com/luckylittlezhi/MELT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes