LGJul 29, 2023

Multi-output Headed Ensembles for Product Item Classification

arXiv:2307.15858v1h-index: 17
Originality Incremental advance
AI Analysis

This work addresses the problem of noisy taxonomy classification for e-commerce platforms, offering incremental improvements in model robustness and evaluation.

The paper tackles product item classification in large e-commerce catalogs, where noisy merchant labels and lack of curated data degrade model performance, and proposes an extensible deep learning framework that improves classification by combining ensembles and metadata features, showing gains against optimized baselines. It also introduces a novel evaluation method using user sessions to assess model performance beyond precision and recall, addressing bottlenecks in deployment.

In this paper, we revisit the problem of product item classification for large-scale e-commerce catalogs. The taxonomy of e-commerce catalogs consists of thousands of genres to which are assigned items that are uploaded by merchants on a continuous basis. The genre assignments by merchants are often wrong but treated as ground truth labels in automatically generated training sets, thus creating a feedback loop that leads to poorer model quality over time. This problem of taxonomy classification becomes highly pronounced due to the unavailability of sizable curated training sets. Under such a scenario it is common to combine multiple classifiers to combat poor generalization performance from a single classifier. We propose an extensible deep learning based classification model framework that benefits from the simplicity and robustness of averaging ensembles and fusion based classifiers. We are also able to use metadata features and low-level feature engineering to boost classification performance. We show these improvements against robust industry standard baseline models that employ hyperparameter optimization. Additionally, due to continuous insertion, deletion and updates to real-world high-volume e-commerce catalogs, assessing model performance for deployment using A/B testing and/or manual annotation becomes a bottleneck. To this end, we also propose a novel way to evaluate model performance using user sessions that provides better insights in addition to traditional measures of precision and recall.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes