CVIRMar 17

Visual Product Search Benchmark

arXiv:2603.171869.7h-index: 5Has Code
Predicted impact top 85% in CV · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work provides a structured benchmark for practitioners and researchers to assess visual product search systems, addressing the need for reliable identification in industrial applications, but it is incremental as it focuses on evaluation rather than new methods.

The authors benchmarked modern visual embedding models for instance-level product image retrieval, evaluating them on industrial and public datasets under realistic conditions to compare foundation models with domain-specific ones.

Reliable product identification from images is a critical requirement in industrial and commercial applications, particularly in maintenance, procurement, and operational workflows where incorrect matches can lead to costly downstream failures. At the core of such systems lies the visual search component, which must retrieve and rank the exact object instance from large and continuously evolving catalogs under diverse imaging conditions. This report presents a structured benchmark of modern visual embedding models for instance-level image retrieval, with a focus on industrial applications. A curated set of open-source foundation embedding models, proprietary multi-modal embedding systems, and domain-specific vision-only models are evaluated under a unified image-to-image retrieval protocol. The benchmark includes curated datasets, which includes industrial datasets derived from production deployments in Manufacturing, Automotive, DIY, and Retail, as well as established public benchmarks. Evaluation is conducted without post-processing, isolating the retrieval capability of each model. The results provide insight into how well contemporary foundation and unified embedding models transfer to fine-grained instance retrieval tasks, and how they compare to models explicitly trained for industrial applications. By emphasizing realistic constraints, heterogeneous image conditions, and exact instance matching requirements, this benchmark aims to inform both practitioners and researchers about the strengths and limitations of current visual embedding approaches in production-level product identification systems. An interactive companion website presenting the benchmark results, evaluation details, and additional visualizations is available at https://benchmark.nyris.io.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes