CVMar 24, 2024

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

arXiv:2403.16005v156 citationsh-index: 47CVPR
Originality Incremental advance
AI Analysis

This work addresses the problem of retrieving target images based on a reference image and description without training data, offering an incremental improvement for computer vision and retrieval applications.

The paper tackles the zero-shot composed image retrieval task by proposing a knowledge-enhanced dual-stream framework that addresses the limitation of ignoring detailed attributes in reference images, achieving improved performance on benchmarks like ImageNet-R, COCO object, Fashion-IQ, and CIRR.

We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets. Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space. However, they focus on the global visual representation, ignoring the representation of detailed attributes, e.g., color, object number and layout. To address this challenge, we propose a Knowledge-Enhanced Dual-stream zero-shot composed image retrieval framework (KEDs). KEDs implicitly models the attributes of the reference images by incorporating a database. The database enriches the pseudo-word tokens by providing relevant images and captions, emphasizing shared attribute information in various aspects. In this way, KEDs recognizes the reference image from diverse perspectives. Moreover, KEDs adopts an extra stream that aligns pseudo-word tokens with textual concepts, leveraging pseudo-triplets mined from image-text pairs. The pseudo-word tokens generated in this stream are explicitly aligned with fine-grained semantics in the text embedding space. Extensive experiments on widely used benchmarks, i.e. ImageNet-R, COCO object, Fashion-IQ and CIRR, show that KEDs outperforms previous zero-shot composed image retrieval methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes