Categorizing Items with Short and Noisy Descriptions using Ensembled Transferred Embeddings
This addresses item categorization for e-commerce platforms to improve user experience and operations, but it is incremental as it builds on transfer learning and ensembling techniques.
The paper tackles item categorization for e-commerce items with short and noisy descriptions and no labels, proposing the Ensembled Transferred Embeddings (ETE) framework that uses semi-automatic labeling and transferable embeddings, and shows it significantly outperforms traditional and state-of-the-art methods on a large-scale PayPal dataset.
Item categorization is a machine learning task which aims at classifying e-commerce items, typically represented by textual attributes, to their most suitable category from a predefined set of categories. An accurate item categorization system is essential for improving both the user experience and the operational processes of the company. In this work, we focus on item categorization settings in which the textual attributes representing items are noisy and short, and labels (i.e., accurate classification of items into categories) are not available. In order to cope with such settings, we propose a novel learning framework, Ensembled Transferred Embeddings (ETE), which relies on two key ideas: 1) labeling a relatively small sample of the target dataset, in a semi-automatic process, and 2) leveraging other datasets from related domains or related tasks that are large-scale and labeled, to extract "transferable embeddings". Evaluation of ETE on a large-scale real-world dataset provided to us by PayPal, shows that it significantly outperforms traditional as well as state-of-the-art item categorization methods.