DBMar 12

BEACON: Budget-Aware Entity Matching Across Domains (Extended Technical Report)

arXiv:2603.11391v18.3h-index: 14
Predicted impact top 53% in DB · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the challenge of expensive labeling in e-commerce entity matching for users with scarce domain-specific data, though it is incremental as it builds on existing deep learning and transfer learning approaches.

The paper tackles the problem of entity matching across domains with limited labeled data by introducing BEACON, a budget-aware framework that uses embedding representations to select out-of-domain samples, achieving consistent outperformance over state-of-the-art methods in experiments across multiple datasets.

Entity Matching (EM)--the task of determining whether two data records refer to the same real-world entity--is a core task in data integration. Recent advances in deep learning have set a new standard for EM, particularly through fine-tuning Pretrained Language Models (PLMs) and, more recently, Large Language Models (LLMs). However, fine-tuning typically requires large amounts of labeled data, which are expensive and time-consuming to obtain. In the context of e-commerce matching, labeling scarcity varies widely across domains, raising the question of how to intelligently train accurate domain-specific EM models with limited labeled data. In this work we assume users have only limited amount of labels for a specific target domain but have access to labeled data from other domains. We introduce BEACON, a distribution-aware, budget-aware framework for low-resource EM across domains. BEACON leverages the insight that embedding representations of pairwise candidate matches can guide the effective selection of out-of-domain samples under limited in-domain supervision. We conduct extensive experiments across multiple domain-partitioned datasets derived from established EM benchmarks, demonstrating that BEACON consistently outperforms state-of-the-art methods under different training budgets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes