HC DBMar 15, 2018

r-HUMO: A Risk-Aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees

Boyi Hou, Qun Chen, Zhaoqiang Chen, Youcef Nafa, Zhanhuai Li

arXiv:1803.05714v311.712 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of reliable entity resolution for data integration and cleaning tasks, offering an incremental improvement over existing human-machine cooperation methods.

The paper tackles the challenge of ensuring quality guarantees in entity resolution by proposing r-HUMO, a risk-aware human-machine cooperation framework that optimizes human workload selection based on real-time risk analysis, achieving better quality control with reduced human cost compared to state-of-the-art alternatives.

Even though many approaches have been proposed for entity resolution (ER), it remains very challenging to find one with quality guarantees. To this end, we proposea risk-aware HUman-Machine cOoperation framework for ER, denoted by r-HUMO. Built on the existing HUMO framework, r-HUMO similarly enforces both precision and recall levels by partitioning an ER workload between the human and the machine. However, r-HUMO is the first solution to optimize the process of human workload selection from a risk perspective. It iteratively selects human workload based on real-time risk analysis on human-labeled results as well as prespecified machine metrics. In this paper,we first introduce the r-HUMO framework and then present the risk analysis technique to prioritize the instances for manual labeling. Finally,we empirically evaluate r-HUMO's performance on real data. Our extensive experiments show that r-HUMO is effective in enforcing quality guarantees,and compared with the state-of-the-art alternatives, it can achieve better quality control with reduced human cost.

View on arXiv PDF

Similar