LGCVIRSep 13, 2023

ProMap: Datasets for Product Mapping in E-commerce

arXiv:2309.06882v12 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This addresses the issue of incomplete or distant non-matching pairs in existing datasets for product mapping, which hinders practical usability in distinguishing similar products.

The paper tackles the problem of product mapping in e-commerce by introducing two new datasets, ProMapCz and ProMapEn, which include 1,495 Czech and 1,555 English product pairs with images and textual descriptions, and demonstrate their complexity through machine learning experiments.

The goal of product mapping is to decide, whether two listings from two different e-shops describe the same products. Existing datasets of matching and non-matching pairs of products, however, often suffer from incomplete product information or contain only very distant non-matching products. Therefore, while predictive models trained on these datasets achieve good results on them, in practice, they are unusable as they cannot distinguish very similar but non-matching pairs of products. This paper introduces two new datasets for product mapping: ProMapCz consisting of 1,495 Czech product pairs and ProMapEn consisting of 1,555 English product pairs of matching and non-matching products manually scraped from two pairs of e-shops. The datasets contain both images and textual descriptions of the products, including their specifications, making them one of the most complete datasets for product mapping. Additionally, the non-matching products were selected in two phases, creating two types of non-matches -- close non-matches and medium non-matches. Even the medium non-matches are pairs of products that are much more similar than non-matches in other datasets -- for example, they still need to have the same brand and similar name and price. After simple data preprocessing, several machine learning algorithms were trained on these and two the other datasets to demonstrate the complexity and completeness of ProMap datasets. ProMap datasets are presented as a golden standard for further research of product mapping filling the gaps in existing ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes