CLAIMay 25

Adaptive Graph Refinement and Label Propagation with LLMs for Cost-Effective Entity Resolution

arXiv:2605.2581471.4
Predicted impact top 89% in CL · last 90 daysOriginality Highly original
AI Analysis

For data management practitioners, Alper offers a cost-effective, unified framework that improves entity resolution accuracy by overcoming error propagation in traditional cascaded pipelines.

Alper unifies matching and clustering for dirty entity resolution via iterative probabilistic label propagation over an evolving graph, adaptively combining cheap graph signals with expensive LLM queries under a budget. It consistently outperforms state-of-the-art cascaded pipelines on eight benchmarks.

Dirty entity resolution (ER), which identifies records referring to the same real-world entity from a single, messy dataset, is a fundamental task in data management and mining. However, the dominant blocking-matching-clustering paradigm for ER suffers from critical flaws. Its cascaded, decoupled workflow essentially produces a static, sparse graph plagued by missing edges (due to blocking failures) and noisy links (due to matching errors), causing error propagation and yielding suboptimal clusters, particularly when rigid transitivity is imposed in the clustering. We contend that matching and clustering are fundamentally synergistic, both optimizing for the construction of an ideal entity graph. Building upon this insight, we propose Alper, a unified framework that integrates these steps into an iterative probabilistic label propagation process over a global, evolving graph. Unlike disjoint blocking, Alper refines the graph structure and labels dynamically by adaptively integrating "weak but cheap" signals from graph propagation with "strong but expensive" LLM-based pairwise queries. For higher cost-effectiveness, we formulate the signal selection as a constrained optimization problem maximizing cumulative marginal gain under a query budget, solved via our greedy algorithm with provable theoretical guarantees. Our extensive experiments over eight benchmark datasets demonstrate that Alper is consistently superior to state-of-the-art cascaded pipelines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes