DB AI LGFeb 1, 2024

Text-Based Product Matching -- Semi-Supervised Clustering Approach

Alicja Martinek, Szymon Łukasik, Amir H. Gandomi

arXiv:2402.10091v12.31 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses the problem of reducing manual labeling efforts for entity matching in e-commerce, though it appears incremental as it builds on existing clustering techniques.

The paper tackled product matching in e-commerce by proposing a semi-supervised clustering approach, showing that using a small annotated sample with the IDEC algorithm on real-world data could serve as an alternative to supervised methods that need extensive labeling.

Matching identical products present in multiple product feeds constitutes a crucial element of many tasks of e-commerce, such as comparing product offerings, dynamic price optimization, and selecting the assortment personalized for the client. It corresponds to the well-known machine learning task of entity matching, with its own specificity, like omnipresent unstructured data or inaccurate and inconsistent product descriptions. This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach. We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset using predominantly textual features and fuzzy string matching, with more standard approaches as a point of reference. Encouraging results show that unsupervised matching, enriched with a small annotated sample of product links, could be a possible alternative to the dominant supervised strategy, requiring extensive manual data labeling.

View on arXiv PDF

Similar