CVJan 10, 2024

Modality-Aware Representation Learning for Zero-shot Sketch-based Image Retrieval

Eunyi Lyou, Doyeon Lee, Jooeun Kim, Joonseok Lee

arXiv:2401.04860v17.610 citationsh-index: 22Has CodeWACV

Originality Incremental advance

AI Analysis

This addresses the challenge of costly data collection for unseen categories in real-world retrieval scenarios, though it appears incremental as it builds on existing zero-shot and cross-modal methods.

The paper tackles the problem of zero-shot sketch-based image retrieval by proposing a framework that aligns sketches and photos indirectly through texts, eliminating the need for paired sketch-photo samples, and achieves effective cross-modal retrieval in a joint latent space.

Zero-shot learning offers an efficient solution for a machine learning model to treat unseen categories, avoiding exhaustive data collection. Zero-shot Sketch-based Image Retrieval (ZS-SBIR) simulates real-world scenarios where it is hard and costly to collect paired sketch-photo samples. We propose a novel framework that indirectly aligns sketches and photos by contrasting them through texts, removing the necessity of access to sketch-photo pairs. With an explicit modality encoding learned from data, our approach disentangles modality-agnostic semantics from modality-specific information, bridging the modality gap and enabling effective cross-modal content retrieval within a joint latent space. From comprehensive experiments, we verify the efficacy of the proposed model on ZS-SBIR, and it can be also applied to generalized and fine-grained settings.

View on arXiv PDF Code

Similar