CVFeb 11, 2025

PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval

arXiv:2502.07215v32 citationsh-index: 57
Originality Incremental advance
AI Analysis

This work solves the problem of enhancing image search with text prompts for users in computer vision, but it is incremental as it builds on existing zero-shot methods.

The paper tackles the problem of zero-shot composed image retrieval by addressing limitations in static query embeddings and suboptimal fusion, introducing Prompt Directional Vectors (PDV) to capture semantic modifications from prompts, resulting in improved retrieval performance across benchmarks.

Zero-shot Composed Image Retrieval (ZS-CIR) enables image search using a reference image and a text prompt without requiring specialized text-image composition networks trained on large-scale paired data. However, current ZS-CIR approaches suffer from three critical limitations in their reliance on composed text embeddings: static query embedding representations, insufficient utilization of image embeddings, and suboptimal performance when fusing text and image embeddings. To address these challenges, we introduce the \textbf{Prompt Directional Vector (PDV)}, a simple yet effective training-free enhancement that captures semantic modifications induced by user prompts. PDV enables three key improvements: (1) Dynamic composed text embeddings where prompt adjustments are controllable via a scaling factor, (2) composed image embeddings through semantic transfer from text prompts to image features, and (3) weighted fusion of composed text and image embeddings that enhances retrieval by balancing visual and semantic similarity. Our approach serves as a plug-and-play enhancement for existing ZS-CIR methods with minimal computational overhead. Extensive experiments across multiple benchmarks demonstrate that PDV consistently improves retrieval performance when integrated with state-of-the-art ZS-CIR approaches, particularly for methods that generate accurate compositional embeddings. The code will be released upon publication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes