CRSep 6, 2021
Privacy-Preserving Database FingerprintingTianxi Ji, Erman Ayday, Emre Yilmaz et al.
When sharing sensitive relational databases with other parties, a database owner aims to (i) have privacy guarantees for the database entries, (ii) have liability guarantees (via fingerprinting) in case of unauthorized sharing of its database by the recipients, and (iii) provide a high quality (utility) database to the recipients. We observe that sharing a relational database with privacy and liability guarantees are orthogonal objectives. The former can be achieved by injecting noise into the database to prevent inference of the original data values, whereas, the latter can be achieved by hiding unique marks inside the database to trace malicious parties (data recipients) who redistribute the data without the authorization. We achieve these two objectives simultaneously by proposing a novel entry-level differentially-private fingerprinting mechanism for relational databases. At a high level, the proposed mechanism fulfills the privacy and liability requirements by leveraging the randomization nature that is intrinsic to fingerprinting and achieves desired entry-level privacy guarantees. To be more specific, we devise a bit-level random response scheme to achieve differential privacy guarantee for arbitrary data entries when sharing the entire database, and then, based on this, we develop an $ε$-entry-level differentially-private fingerprinting mechanism. Next, we theoretically analyze the relationships between privacy guarantee, fingerprint robustness, and database utility by deriving closed form expressions. The outcome of this analysis allows us to bound the privacy leakage caused by attribute inference attack and characterize the privacy-utility coupling and privacy-fingerprint robustness coupling. Furthermore, we also propose a SVT-based solution to control the cumulative privacy loss when fingerprinted copies of a database are shared with multiple recipients.
ROMar 14, 2021
Design of a vision based range bearing and heading system for robot swarmsHamid Majidi Balanji, Emre Yilmaz, Omer Cakmak et al.
An essential problem of swarm robotics is how members of the swarm knows the positions of other robots. The main aim of this research is to develop a cost-effective and simple vision-based system to detect the range, bearing, and heading of the robots inside a swarm using a multi-purpose passive landmark. A small Zumo robot equipped with Raspberry Pi, PiCamera is utilized for the implementation of the algorithm, and different kinds of multipurpose passive landmarks with nonsymmetrical patterns, which give reliable information about the range, bearing and heading in a single unit, are designed. By comparing the recorded features obtained from image analysis of the landmark through systematical experimentation and the actual measurements, correlations are obtained, and algorithms converting those features into range, bearing and heading are designed. The reliability and accuracy of algorithms are tested and errors are found within an acceptable range.
CRMar 11, 2021
The Curse of Correlations for Robust Fingerprinting of Relational DatabasesTianxi Ji, Emre Yilmaz, Erman Ayday et al.
Database fingerprinting have been widely adopted to prevent unauthorized sharing of data and identify the source of data leakages. Although existing schemes are robust against common attacks, like random bit flipping and subset attack, their robustness degrades significantly if attackers utilize the inherent correlations among database entries. In this paper, we first demonstrate the vulnerability of existing database fingerprinting schemes by identifying different correlation attacks: column-wise correlation attack, row-wise correlation attack, and the integration of them. To provide robust fingerprinting against the identified correlation attacks, we then develop mitigation techniques, which can work as post-processing steps for any off-the-shelf database fingerprinting schemes. The proposed mitigation techniques also preserve the utility of the fingerprinted database considering different utility metrics. We empirically investigate the impact of the identified correlation attacks and the performance of mitigation techniques using real-world relational databases. Our results show (i) high success rates of the identified correlation attacks against existing fingerprinting schemes (e.g., the integrated correlation attack can distort 64.8\% fingerprint bits by just modifying 14.2\% entries in a fingerprinted database), and (ii) high robustness of the proposed mitigation techniques (e.g., with the mitigation techniques, the integrated correlation attack can only distort $3\%$ fingerprint bits).
CRFeb 15, 2021
Genomic Data Sharing under Dependent Local Differential PrivacyEmre Yilmaz, Tianxi Ji, Erman Ayday et al.
Privacy-preserving genomic data sharing is prominent to increase the pace of genomic research, and hence to pave the way towards personalized genomic medicine. In this paper, we introduce ($ε, T$)-dependent local differential privacy (LDP) for privacy-preserving sharing of correlated data and propose a genomic data sharing mechanism under this privacy definition. We first show that the original definition of LDP is not suitable for genomic data sharing, and then we propose a new mechanism to share genomic data. The proposed mechanism considers the correlations in data during data sharing, eliminates statistically unlikely data values beforehand, and adjusts the probability distributions for each shared data point accordingly. By doing so, we show that we can avoid an attacker from inferring the correct values of the shared data points by utilizing the correlations in the data. By adjusting the probability distributions of the shared states of each data point, we also improve the utility of shared data for the data collector. Furthermore, we develop a greedy algorithm that strategically identifies the processing order of the shared data points with the aim of maximizing the utility of the shared data. Considering the interdependent privacy risks while sharing genomic data, we also analyze the information gain of an attacker about genomes of a donor's family members by observing perturbed data of the genome donor and we propose a mechanism to select the privacy budget (i.e., $ε$ parameter of LDP) of the donor by also considering privacy preferences of her family members. Our evaluation results on a real-life genomic dataset show the superiority of the proposed mechanism compared to the randomized response mechanism (a widely used technique to achieve LDP).
CRJan 27, 2020
Collusion-Resilient Probabilistic Fingerprinting Scheme for Correlated DataEmre Yilmaz, Erman Ayday
In order to receive personalized services, individuals share their personal data with a wide range of service providers, hoping that their data will remain confidential. Thus, in case of an unauthorized distribution of their personal data by these service providers (or in case of a data breach) data owners want to identify the source of such data leakage. Digital fingerprinting schemes have been developed to embed a hidden and unique fingerprint into shared digital content, especially multimedia, to provide such liability guarantees. However, existing techniques utilize the high redundancy in the content, which is typically not included in personal data. In this work, we propose a probabilistic fingerprinting scheme that efficiently generates the fingerprint by considering a fingerprinting probability (to keep the data utility high) and publicly known inherent correlations between data points. To improve the robustness of the proposed scheme against colluding malicious service providers, we also utilize the Boneh-Shaw fingerprinting codes as a part of the proposed scheme. Furthermore, observing similarities between privacy-preserving data sharing techniques (that add controlled noise to the shared data) and the proposed fingerprinting scheme, we make a first attempt to develop a data sharing scheme that provides both privacy and fingerprint robustness at the same time. We experimentally show that fingerprint robustness and privacy have conflicting objectives and we propose a hybrid approach to control such a trade-off with a design parameter. Using the proposed hybrid approach, we show that individuals can improve their level of privacy by slightly compromising from the fingerprint robustness. We implement and evaluate the performance of the proposed scheme on real genomic data. Our experimental results show the efficiency and robustness of the proposed scheme.
NENov 19, 2019
Deep Spiking Neural Networks for Large Vocabulary Automatic Speech RecognitionJibin Wu, Emre Yilmaz, Malu Zhang et al.
Artificial neural networks (ANN) have become the mainstream acoustic modeling technique for large vocabulary automatic speech recognition (ASR). A conventional ANN features a multi-layer architecture that requires massive amounts of computation. The brain-inspired spiking neural networks (SNN) closely mimic the biological neural networks and can operate on low-power neuromorphic hardware with spike-based computation. Motivated by their unprecedented energyefficiency and rapid information processing capability, we explore the use of SNNs for speech recognition. In this work, we use SNNs for acoustic modeling and evaluate their performance on several large vocabulary recognition scenarios. The experimental results demonstrate competitive ASR accuracies to their ANN counterparts, while require significantly reduced computational cost and inference time. Integrating the algorithmic power of deep SNNs with energy-efficient neuromorphic hardware, therefore, offer an attractive solution for ASR applications running locally on mobile and embedded devices.
LGMay 3, 2019
Locally Differentially Private Naive Bayes ClassificationEmre Yilmaz, Mohammad Al-Rubaie, J. Morris Chang
In machine learning, classification models need to be trained in order to predict class labels. When the training data contains personal information about individuals, collecting training data becomes difficult due to privacy concerns. Local differential privacy is a definition to measure the individual privacy when there is no trusted data curator. Individuals interact with an untrusted data aggregator who obtains statistical information about the population without learning personal data. In order to train a Naive Bayes classifier in an untrusted setting, we propose to use methods satisfying local differential privacy. Individuals send their perturbed inputs that keep the relationship between the feature values and class labels. The data aggregator estimates all probabilities needed by the Naive Bayes classifier. Then, new instances can be classified based on the estimated probabilities. We propose solutions for both discrete and continuous data. In order to eliminate high amount of noise and decrease communication cost in multi-dimensional data, we propose utilizing dimensionality reduction techniques which can be applied by individuals before perturbing their inputs. Our experimental results show that the accuracy of the Naive Bayes classifier is maintained even when the individual privacy is guaranteed under local differential privacy, and that using dimensionality reduction enhances the accuracy.
ASApr 16, 2019
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared ExperiencesKong Aik Lee, Ville Hautamaki, Tomi Kinnunen et al.
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.
CLJul 23, 2018
ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languagesRaghav Menon, Herman Kamper, Emre Yilmaz et al.
We consider multilingual bottleneck features (BNFs) for nearly zero-resource keyword spotting. This forms part of a United Nations effort using keyword spotting to support humanitarian relief programmes in parts of Africa where languages are severely under-resourced. We use 1920 isolated keywords (40 types, 34 minutes) as exemplars for dynamic time warping (DTW) template matching, which is performed on a much larger body of untranscribed speech. These DTW costs are used as targets for a convolutional neural network (CNN) keyword spotter, giving a much faster system than direct DTW. Here we consider how available data from well-resourced languages can improve this CNN-DTW approach. We show that multilingual BNFs trained on ten languages improve the area under the ROC curve of a CNN-DTW system by 10.9% absolute relative to the MFCC baseline. By combining low-resource DTW-based supervision with information from well-resourced languages, CNN-DTW is a competitive option for low-resource keyword spotting.
CRJan 6, 2018
Privacy-Preserving Aggregate Queries for Optimal Location SelectionEmre Yilmaz, Hakan Ferhatosmanoglu, Erman Ayday et al.
Today, vast amounts of location data are collected by various service providers. These location data owners have a good idea of where their users are most of the time. Other businesses also want to use this information for location analytics, such as finding the optimal location for a new branch. However, location data owners cannot share their data with other businesses, mainly due to privacy and legal concerns. In this paper, we propose privacy-preserving solutions in which location-based queries can be answered by data owners without sharing their data with other businesses and without accessing sensitive information such as the customer list of the businesses that send the query. We utilize a partially homomorphic cryptosystem as the building block of the proposed protocols. We prove the security of the protocols in semi-honest threat model. We also explain how to achieve differential privacy in the proposed protocols and discuss its impact on utility. We evaluate the performance of the protocols with real and synthetic datasets and show that the proposed solutions are highly practical. The proposed solutions will facilitate an effective sharing of sensitive data between entities and joint analytics in a wide range of applications without violating their customers' privacy.