IRDBJul 26, 2018

General Context-Aware Data Matching and Merging Framework

arXiv:1807.10009v13 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of insufficient or domain-limited data integration methods for users dealing with diverse information sources, though it appears incremental as it builds on existing techniques with new metrics.

The paper tackles the problem of combining heterogeneous data from multiple sources by proposing a general framework for data matching and merging that uses context dimensions, semantics, and trust, resulting in improved overall results across five public datasets.

Due to numerous public information sources and services, many methods to combine heterogeneous data were proposed recently. However, general end-to-end solutions are still rare, especially systems taking into account different context dimensions. Therefore, the techniques often prove insufficient or are limited to a certain domain. In this paper we briefly review and rigorously evaluate a general framework for data matching and merging. The framework employs collective entity resolution and redundancy elimination using three dimensions of context types. In order to achieve domain independent results, data is enriched with semantics and trust. However, the main contribution of the paper is evaluation on five public domain-incompatible datasets. Furthermore, we introduce additional attribute, relationship, semantic and trust metrics, which allow complete framework management. Besides overall results improvement within the framework, metrics could be of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes