DBAILGAug 25, 2015

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

arXiv:1508.06013v134 citations
Originality Synthesis-oriented
AI Analysis

This work addresses data cleaning for database management, but it is incremental as it combines existing methods without introducing a new paradigm.

The paper tackles entity resolution by integrating machine learning classifiers with matching dependencies and LogiQL for data processing, resulting in a combined approach that improves blocking and merging phases.

Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicate/non-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes