Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based on gradual information disclosure
This work addresses data integration tasks for sensitive information, offering an incremental improvement by enhancing PPRL with human-in-the-loop review.
The paper tackles the problem of improving linkage quality in Privacy-Preserving Record Linkage (PPRL) by integrating clerical review through a multi-layer active learning process, resulting in considerable linkage quality improvements with limited labeling effort and privacy risks on real-world datasets.
Privacy-Preserving Record linkage (PPRL) is an essential component in data integration tasks of sensitive information. The linkage quality determines the usability of combined datasets and (machine learning) applications based on them. We present a novel privacy-preserving protocol that integrates clerical review in PPRL using a multi-layer active learning process. Uncertain match candidates are reviewed on several layers by human and non-human oracles to reduce the amount of disclosed information per record and in total. Predictions are propagated back to update previous layers, resulting in an improved linkage performance for non-reviewed candidates as well. The data owners remain in control of the amount of information they share for each record. Therefore, our approach follows need-to-know and data sovereignty principles. The experimental evaluation on real-world datasets shows considerable linkage quality improvements with limited labeling effort and privacy risks.