CL LGDec 14, 2018

Don't Classify, Translate: Multi-Level E-Commerce Product Categorization Via Machine Translation

Maggie Yundi Li, Stanley Kok, Liling Tan

arXiv:1812.05774v11.119 citations

Originality Highly original

AI Analysis

This addresses the problem of multi-level product categorization for e-commerce platforms, offering a novel paradigm that is not incremental.

The paper tackles product categorization in e-commerce by proposing a machine translation approach that converts product descriptions into taxonomy paths, achieving better predictive accuracy than state-of-the-art classification systems on two large datasets.

E-commerce platforms categorize their products into a multi-level taxonomy tree with thousands of leaf categories. Conventional methods for product categorization are typically based on machine learning classification algorithms. These algorithms take product information as input (e.g., titles and descriptions) to classify a product into a leaf category. In this paper, we propose a new paradigm based on machine translation. In our approach, we translate a product's natural language description into a sequence of tokens representing a root-to-leaf path in a product taxonomy. In our experiments on two large real-world datasets, we show that our approach achieves better predictive accuracy than a state-of-the-art classification system for product categorization. In addition, we demonstrate that our machine translation models can propose meaningful new paths between previously unconnected nodes in a taxonomy tree, thereby transforming the taxonomy into a directed acyclic graph (DAG). We discuss how the resultant taxonomy DAG promotes user-friendly navigation, and how it is more adaptable to new products.

View on arXiv PDF

Similar