DBCLIRApr 23, 2024

Towards Universal Dense Blocking for Entity Resolution

arXiv:2404.14831v24 citationsh-index: 29
Originality Highly original
AI Analysis

This addresses the need for rapid adaptation of dense blocking methods across various domains in entity resolution, though it is incremental as it builds on existing neural representation models.

The paper tackles the problem of domain-specific training requirements in dense blocking for entity resolution by proposing UniBlocker, a pre-trained model that, without domain-specific fine-tuning, significantly outperforms previous self- and unsupervised dense blocking methods and is comparable to state-of-the-art sparse blocking methods.

Blocking is a critical step in entity resolution, and the emergence of neural network-based representation models has led to the development of dense blocking as a promising approach for exploring deep semantics in blocking. However, previous advanced self-supervised dense blocking approaches require domain-specific training on the target domain, which limits the benefits and rapid adaptation of these methods. To address this issue, we propose UniBlocker, a dense blocker that is pre-trained on a domain-independent, easily-obtainable tabular corpus using self-supervised contrastive learning. By conducting domain-independent pre-training, UniBlocker can be adapted to various downstream blocking scenarios without requiring domain-specific fine-tuning. To evaluate the universality of our entity blocker, we also construct a new benchmark covering a wide range of blocking tasks from multiple domains and scenarios. Our experiments show that the proposed UniBlocker, without any domain-specific learning, significantly outperforms previous self- and unsupervised dense blocking methods and is comparable and complementary to the state-of-the-art sparse blocking methods.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes