CVAug 9, 2023

Cross-view Semantic Alignment for Livestreaming Product Recognition

arXiv:2308.04912v23 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This work addresses product recognition in live streaming for e-commerce, but it is incremental as it builds on existing multimodal and contrastive learning approaches.

The authors tackled the problem of recognizing products in live commerce streams by introducing LPR4M, a large-scale multimodal dataset 50x larger than existing ones, and the RICE model, which achieved improved recognition through cross-view semantic alignment.

Live commerce is the act of selling products online through live streaming. The customer's diverse demands for online products introduce more challenges to Livestreaming Product Recognition. Previous works have primarily focused on fashion clothing data or utilize single-modal input, which does not reflect the real-world scenario where multimodal data from various categories are present. In this paper, we present LPR4M, a large-scale multimodal dataset that covers 34 categories, comprises 3 modalities (image, video, and text), and is 50x larger than the largest publicly available dataset. LPR4M contains diverse videos and noise modality pairs while exhibiting a long-tailed distribution, resembling real-world problems. Moreover, a cRoss-vIew semantiC alignmEnt (RICE) model is proposed to learn discriminative instance features from the image and video views of the products. This is achieved through instance-level contrastive learning and cross-view patch-level feature propagation. A novel Patch Feature Reconstruction loss is proposed to penalize the semantic misalignment between cross-view patches. Extensive experiments demonstrate the effectiveness of RICE and provide insights into the importance of dataset diversity and expressivity. The dataset and code are available at https://github.com/adxcreative/RICE

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes