AIJul 7, 2022

Multimodal E-Commerce Product Classification Using Hierarchical Fusion

arXiv:2207.03305v16 citationsh-index: 20
Originality Synthesis-oriented
AI Analysis

This work addresses multimodal classification for e-commerce, but it is incremental as it applies established fusion methods to a specific domain.

The paper tackled product classification by combining text and image features using simple fusion techniques, achieving significant performance improvements over unimodal models and similar existing models.

In this work, we present a multi-modal model for commercial product classification, that combines features extracted by multiple neural network models from textual (CamemBERT and FlauBERT) and visual data (SE-ResNeXt-50), using simple fusion techniques. The proposed method significantly outperformed the unimodal models' performance and the reported performance of similar models on our specific task. We did experiments with multiple fusing techniques and found, that the best performing technique to combine the individual embedding of the unimodal network is based on combining concatenation and averaging the feature vectors. Each modality complemented the shortcomings of the other modalities, demonstrating that increasing the number of modalities can be an effective method for improving the performance of multi-label and multimodal classification problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes