CLJul 5, 2022

Block-SCL: Blocking Matters for Supervised Contrastive Learning in Product Matching

Mario Almagro, David Jiménez, Diego Ortego, Emilio Almazán, Eva Martínez

arXiv:2207.02008v10.84 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses product matching in e-commerce, an incremental improvement that leverages existing blocking stages for better training.

The paper tackled product matching by using blocking output to enhance supervised contrastive learning, achieving state-of-the-art results on public datasets with only short product titles and a lighter transformer backbone.

Product matching is a fundamental step for the global understanding of consumer behavior in e-commerce. In practice, product matching refers to the task of deciding if two product offers from different data sources (e.g. retailers) represent the same product. Standard pipelines use a previous stage called blocking, where for a given product offer a set of potential matching candidates are retrieved based on similar characteristics (e.g. same brand, category, flavor, etc.). From these similar product candidates, those that are not a match can be considered hard negatives. We present Block-SCL, a strategy that uses the blocking output to make the most of Supervised Contrastive Learning (SCL). Concretely, Block-SCL builds enriched batches using the hard-negatives samples obtained in the blocking stage. These batches provide a strong training signal leading the model to learn more meaningful sentence embeddings for product matching. Experimental results in several public datasets demonstrate that Block-SCL achieves state-of-the-art results despite only using short product titles as input, no data augmentation, and a lighter transformer backbone than competing methods.

View on arXiv PDF

Similar