CRDBLGDec 5, 2024

Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based on gradual information disclosure

arXiv:2412.04178v11 citationsh-index: 8BTW
Originality Incremental advance
AI Analysis

This work addresses data integration tasks for sensitive information, offering an incremental improvement by enhancing PPRL with human-in-the-loop review.

The paper tackles the problem of improving linkage quality in Privacy-Preserving Record Linkage (PPRL) by integrating clerical review through a multi-layer active learning process, resulting in considerable linkage quality improvements with limited labeling effort and privacy risks on real-world datasets.

Privacy-Preserving Record linkage (PPRL) is an essential component in data integration tasks of sensitive information. The linkage quality determines the usability of combined datasets and (machine learning) applications based on them. We present a novel privacy-preserving protocol that integrates clerical review in PPRL using a multi-layer active learning process. Uncertain match candidates are reviewed on several layers by human and non-human oracles to reduce the amount of disclosed information per record and in total. Predictions are propagated back to update previous layers, resulting in an improved linkage performance for non-reviewed candidates as well. The data owners remain in control of the amount of information they share for each record. Therefore, our approach follows need-to-know and data sovereignty principles. The experimental evaluation on real-world datasets shows considerable linkage quality improvements with limited labeling effort and privacy risks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes