CLAIIRJan 7, 2025

TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification

arXiv:2501.03835v44 citationsh-index: 5Has CodeACL
Originality Incremental advance
AI Analysis

This addresses a key task for improving product search and recommendation on e-commerce platforms, with incremental improvements in scalability and handling of challenging value types.

The paper tackles the problem of Product Attribute Value Identification (PAVI) in e-commerce by introducing TACLR, a retrieval-based method that effectively handles implicit and out-of-distribution values while scaling to large datasets, achieving successful deployment on a real-world platform processing millions of listings daily.

Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendation, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit values, handling out-of-distribution (OOD) values, and producing normalized outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR formulates PAVI as an information retrieval task by encoding product profiles and candidate values into embeddings and retrieving values based on their similarity. It leverages contrastive training with taxonomy-aware hard negative sampling and employs adaptive inference with dynamic thresholds. TACLR offers three key advantages: (1) it effectively handles implicit and OOD values while producing normalized outputs; (2) it scales to thousands of categories, tens of thousands of attributes, and millions of values; and (3) it supports efficient inference for high-load industrial deployment. Extensive experiments on proprietary and public datasets validate the effectiveness and efficiency of TACLR. Further, it has been successfully deployed on the real-world e-commerce platform Xianyu, processing millions of product listings daily with frequently updated, large-scale attribute taxonomies. We release the code to facilitate reproducibility and future research at https://github.com/SuYindu/TACLR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes