CVJun 26, 2023

Mutual Query Network for Multi-Modal Product Image Segmentation

arXiv:2306.14399v1h-index: 13
Originality Incremental advance
AI Analysis

This addresses the problem of distinguishing irrelevant products in e-commerce segmentation, though it is incremental as it builds on existing multi-modal approaches.

The paper tackles product image segmentation in e-commerce by proposing a mutual query network that integrates visual and linguistic modalities, achieving significant performance improvements over state-of-the-art methods on a new dataset of 30,000 images.

Product image segmentation is vital in e-commerce. Most existing methods extract the product image foreground only based on the visual modality, making it difficult to distinguish irrelevant products. As product titles contain abundant appearance information and provide complementary cues for product image segmentation, we propose a mutual query network to segment products based on both visual and linguistic modalities. First, we design a language query vision module to obtain the response of language description in image areas, thus aligning the visual and linguistic representations across modalities. Then, a vision query language module utilizes the correlation between visual and linguistic modalities to filter the product title and effectively suppress the content irrelevant to the vision in the title. To promote the research in this field, we also construct a Multi-Modal Product Segmentation dataset (MMPS), which contains 30,000 images and corresponding titles. The proposed method significantly outperforms the state-of-the-art methods on MMPS.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes