CVJan 21

LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval

arXiv:2601.14706v13 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

It addresses the need for a dynamic and holistic evaluation framework for fashion image retrieval in e-commerce, though it is incremental as it builds on existing benchmarks.

The paper introduces LookBench, a live benchmark for fashion image retrieval that includes real product and AI-generated images, and reports that it challenges existing models with many achieving below 60% Recall@1, while their proprietary model achieves the best performance.

In this paper, we present LookBench (We use the term "look" to reflect retrieval that mirrors how people shop -- finding the exact item, a close substitute, or a visually consistent alternative.), a live, holistic and challenging benchmark for fashion image retrieval in real e-commerce settings. LookBench includes both recent product images sourced from live websites and AI-generated fashion images, reflecting contemporary trends and use cases. Each test sample is time-stamped and we intend to update the benchmark periodically, enabling contamination-aware evaluation aligned with declared training cutoffs. Grounded in our fine-grained attribute taxonomy, LookBench covers single-item and outfit-level retrieval across. Our experiments reveal that LookBench poses a significant challenge on strong baselines, with many models achieving below $60\%$ Recall@1. Our proprietary model achieves the best performance on LookBench, and we release an open-source counterpart that ranks second, with both models attaining state-of-the-art results on legacy Fashion200K evaluations. LookBench is designed to be updated semi-annually with new test samples and progressively harder task variants, providing a durable measure of progress. We publicly release our leaderboard, dataset, evaluation code, and trained models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes