CRSep 6, 2021
Privacy-Preserving Database FingerprintingTianxi Ji, Erman Ayday, Emre Yilmaz et al.
When sharing sensitive relational databases with other parties, a database owner aims to (i) have privacy guarantees for the database entries, (ii) have liability guarantees (via fingerprinting) in case of unauthorized sharing of its database by the recipients, and (iii) provide a high quality (utility) database to the recipients. We observe that sharing a relational database with privacy and liability guarantees are orthogonal objectives. The former can be achieved by injecting noise into the database to prevent inference of the original data values, whereas, the latter can be achieved by hiding unique marks inside the database to trace malicious parties (data recipients) who redistribute the data without the authorization. We achieve these two objectives simultaneously by proposing a novel entry-level differentially-private fingerprinting mechanism for relational databases. At a high level, the proposed mechanism fulfills the privacy and liability requirements by leveraging the randomization nature that is intrinsic to fingerprinting and achieves desired entry-level privacy guarantees. To be more specific, we devise a bit-level random response scheme to achieve differential privacy guarantee for arbitrary data entries when sharing the entire database, and then, based on this, we develop an $ε$-entry-level differentially-private fingerprinting mechanism. Next, we theoretically analyze the relationships between privacy guarantee, fingerprint robustness, and database utility by deriving closed form expressions. The outcome of this analysis allows us to bound the privacy leakage caused by attribute inference attack and characterize the privacy-utility coupling and privacy-fingerprint robustness coupling. Furthermore, we also propose a SVT-based solution to control the cumulative privacy loss when fingerprinted copies of a database are shared with multiple recipients.
CRMar 11, 2021
The Curse of Correlations for Robust Fingerprinting of Relational DatabasesTianxi Ji, Emre Yilmaz, Erman Ayday et al.
Database fingerprinting have been widely adopted to prevent unauthorized sharing of data and identify the source of data leakages. Although existing schemes are robust against common attacks, like random bit flipping and subset attack, their robustness degrades significantly if attackers utilize the inherent correlations among database entries. In this paper, we first demonstrate the vulnerability of existing database fingerprinting schemes by identifying different correlation attacks: column-wise correlation attack, row-wise correlation attack, and the integration of them. To provide robust fingerprinting against the identified correlation attacks, we then develop mitigation techniques, which can work as post-processing steps for any off-the-shelf database fingerprinting schemes. The proposed mitigation techniques also preserve the utility of the fingerprinted database considering different utility metrics. We empirically investigate the impact of the identified correlation attacks and the performance of mitigation techniques using real-world relational databases. Our results show (i) high success rates of the identified correlation attacks against existing fingerprinting schemes (e.g., the integrated correlation attack can distort 64.8\% fingerprint bits by just modifying 14.2\% entries in a fingerprinted database), and (ii) high robustness of the proposed mitigation techniques (e.g., with the mitigation techniques, the integrated correlation attack can only distort $3\%$ fingerprint bits).
CRFeb 15, 2021
Genomic Data Sharing under Dependent Local Differential PrivacyEmre Yilmaz, Tianxi Ji, Erman Ayday et al.
Privacy-preserving genomic data sharing is prominent to increase the pace of genomic research, and hence to pave the way towards personalized genomic medicine. In this paper, we introduce ($ε, T$)-dependent local differential privacy (LDP) for privacy-preserving sharing of correlated data and propose a genomic data sharing mechanism under this privacy definition. We first show that the original definition of LDP is not suitable for genomic data sharing, and then we propose a new mechanism to share genomic data. The proposed mechanism considers the correlations in data during data sharing, eliminates statistically unlikely data values beforehand, and adjusts the probability distributions for each shared data point accordingly. By doing so, we show that we can avoid an attacker from inferring the correct values of the shared data points by utilizing the correlations in the data. By adjusting the probability distributions of the shared states of each data point, we also improve the utility of shared data for the data collector. Furthermore, we develop a greedy algorithm that strategically identifies the processing order of the shared data points with the aim of maximizing the utility of the shared data. Considering the interdependent privacy risks while sharing genomic data, we also analyze the information gain of an attacker about genomes of a donor's family members by observing perturbed data of the genome donor and we propose a mechanism to select the privacy budget (i.e., $ε$ parameter of LDP) of the donor by also considering privacy preferences of her family members. Our evaluation results on a real-life genomic dataset show the superiority of the proposed mechanism compared to the randomized response mechanism (a widely used technique to achieve LDP).