LGAIIRAug 27, 2025

Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach

arXiv:2508.20013v2h-index: 5
Originality Incremental advance
AI Analysis

It addresses industrial challenges in e-commerce for businesses needing accurate and scalable product categorization, though it is incremental in combining existing methods.

This study tackled the problem of e-commerce product categorization across heterogeneous platforms by developing a multimodal hierarchical classification framework, achieving a hierarchical F1 score of 98.59% and discovering new fine-grained categories with cluster purities above 86%.

This study addresses critical industrial challenges in e-commerce product categorization, namely platform heterogeneity and the structural limitations of existing taxonomies, by developing and deploying a multimodal hierarchical classification framework. Using a dataset of 271,700 products from 40 international fashion e-commerce platforms, we integrate textual features (RoBERTa), visual features (ViT), and joint vision-language representations (CLIP). We investigate fusion strategies, including early, late, and attention-based fusion within a hierarchical architecture enhanced by dynamic masking to ensure taxonomic consistency. Results show that CLIP embeddings combined via an MLP-based late-fusion strategy achieve the highest hierarchical F1 (98.59%), outperforming unimodal baselines. To address shallow or inconsistent categories, we further introduce a self-supervised "product recategorization" pipeline using SimCLR, UMAP, and cascade clustering, which discovered new, fine-grained categories (for example, subtypes of "Shoes") with cluster purities above 86%. Cross-platform experiments reveal a deployment-relevant trade-off: complex late-fusion methods maximize accuracy with diverse training data, while simpler early-fusion methods generalize more effectively to unseen platforms. Finally, we demonstrate the framework's industrial scalability through deployment in EURWEB's commercial transaction intelligence platform via a two-stage inference pipeline, combining a lightweight RoBERTa stage with a GPU-accelerated multimodal stage to balance cost and accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes