Multi-Party Privacy-Preserving Record Linkage using Bloom Filters
This addresses the need for secure record linkage in applications like public health and fraud detection, but it is incremental as it extends existing two-party PPRL methods to multiple parties.
The paper tackles the problem of multi-party privacy-preserving record linkage (PPRL) by proposing a protocol using Bloom filter encoding and distributed secure summation to efficiently identify matching records across more than two data sources with high similarity, evaluated on a large real voter registration database.
Privacy-preserving record linkage (PPRL), the problem of identifying records that correspond to the same real-world entity across several data sources held by different parties without revealing any sensitive information about these records, is increasingly being required in many real-world application areas. Examples range from public health surveillance to crime and fraud detection, and national security. Various techniques have been developed to tackle the problem of PPRL, with the majority of them considering linking data from only two sources. However, in many real-world applications data from more than two sources need to be linked. In this paper we propose a viable solution for multi-party PPRL using two efficient privacy techniques: Bloom filter encoding and distributed secure summation. Our proposed protocol efficiently identifies matching sets of records held by all data sources that have a similarity above a certain minimum threshold. While being efficient, our protocol is also secure under the semi-honest adversary model in that no party can learn any sensitive information about any other parties' data, but all parties learn which of their records have a high similarity with records held by the other parties. We evaluate our protocol on a large real voter registration database showing the scalability, linkage quality, and privacy of our approach.