DBCLFeb 21, 2020

Crowdsourced Collective Entity Resolution with Relational Match Propagation

arXiv:2002.09361v16 citations
AI Analysis

This addresses the challenge of efficiently resolving entities in knowledge bases for applications requiring accurate data integration, though it is an incremental improvement over existing human-in-the-loop methods.

The paper tackles the problem of high labor costs and insufficient labeling in crowdsourced entity resolution by proposing a collective approach that leverages relationships between entities to infer matches jointly, achieving superior accuracy with much less labeling compared to state-of-the-art methods.

Knowledge bases (KBs) store rich yet heterogeneous entities and facts. Entity resolution (ER) aims to identify entities in KBs which refer to the same real-world object. Recent studies have shown significant benefits of involving humans in the loop of ER. They often resolve entities with pairwise similarity measures over attribute values and resort to the crowds to label uncertain ones. However, existing methods still suffer from high labor costs and insufficient labeling to some extent. In this paper, we propose a novel approach called crowdsourced collective ER, which leverages the relationships between entities to infer matches jointly rather than independently. Specifically, it iteratively asks human workers to label picked entity pairs and propagates the labeling information to their neighbors in distance. During this process, we address the problems of candidate entity pruning, probabilistic propagation, optimal question selection and error-tolerant truth inference. Our experiments on real-world datasets demonstrate that, compared with state-of-the-art methods, our approach achieves superior accuracy with much less labeling.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes