From Pixels to Purchase: Building and Evaluating a Taxonomy-Decoupled Visual Search Engine for Home Goods E-commerce
This addresses the challenge of robust and scalable visual search for e-commerce platforms, particularly in style-driven domains, though it is incremental as it builds on existing retrieval methods.
The paper tackled the problem of noisy and inflexible visual search in e-commerce by proposing a taxonomy-decoupled architecture with classification-free region proposals and unified embeddings, which improved retrieval quality and increased customer engagement when deployed on a home goods platform. It also introduced an LLM-as-a-Judge framework for zero-shot evaluation, correlating well with real-world outcomes.
Visual search is critical for e-commerce, especially in style-driven domains where user intent is subjective and open-ended. Existing industrial systems typically couple object detection with taxonomy-based classification and rely on catalog data for evaluation, which is prone to noise that limits robustness and scalability. We propose a taxonomy-decoupled architecture that uses classification-free region proposals and unified embeddings for similarity retrieval, enabling a more flexible and generalizable visual search. To overcome the evaluation bottleneck, we propose an LLM-as-a-Judge framework that assesses nuanced visual similarity and category relevance for query-result pairs in a zero-shot manner, removing dependence on human annotations or noise-prone catalog data. Deployed at scale on a global home goods platform, our system improves retrieval quality and yields a measurable uplift in customer engagement, while our offline evaluation metrics strongly correlate with real-world outcomes.