CVJun 5, 2023

LRVS-Fashion: Extending Visual Search with Referring Instructions

arXiv:2306.02928v34 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of precise visual search in fashion for industry applications, though it is incremental as it builds on existing contrastive learning and dataset creation trends.

The paper tackles the ambiguity in fashion image similarity search by introducing Referred Visual Search (RVS), which allows users to specify desired similarities with instructions, and demonstrates that a weakly-supervised conditional contrastive learning method outperforms detection-based baselines, achieving superior Recall at one against 2M distractors.

This paper introduces a new challenge for image similarity search in the context of fashion, addressing the inherent ambiguity in this domain stemming from complex images. We present Referred Visual Search (RVS), a task allowing users to define more precisely the desired similarity, following recent interest in the industry. We release a new large public dataset, LRVS-Fashion, consisting of 272k fashion products with 842k images extracted from fashion catalogs, designed explicitly for this task. However, unlike traditional visual search methods in the industry, we demonstrate that superior performance can be achieved by bypassing explicit object detection and adopting weakly-supervised conditional contrastive learning on image tuples. Our method is lightweight and demonstrates robustness, reaching Recall at one superior to strong detection-based baselines against 2M distractors. The dataset is available at https://huggingface.co/datasets/Slep/LAION-RVS-Fashion .

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes