CRJul 5, 2023
SoK: Privacy-Preserving Data SynthesisYuzheng Hu, Fan Wu, Qinbin Li et al.
As the prevalence of data analysis grows, safeguarding data privacy has become a paramount concern. Consequently, there has been an upsurge in the development of mechanisms aimed at privacy-preserving data analyses. However, these approaches are task-specific; designing algorithms for new tasks is a cumbersome process. As an alternative, one can create synthetic data that is (ideally) devoid of private information. This paper focuses on privacy-preserving data synthesis (PPDS) by providing a comprehensive overview, analysis, and discussion of the field. Specifically, we put forth a master recipe that unifies two prominent strands of research in PPDS: statistical methods and deep learning (DL)-based methods. Under the master recipe, we further dissect the statistical methods into choices of modeling and representation, and investigate the DL-based methods by different generative modeling principles. To consolidate our findings, we provide comprehensive reference tables, distill key takeaways, and identify open problems in the existing literature. In doing so, we aim to answer the following questions: What are the design principles behind different PPDS methods? How can we categorize these methods, and what are the advantages and disadvantages associated with each category? Can we provide guidelines for method selection in different real-world scenarios? We proceed to benchmark several prominent DL-based methods on the task of private image synthesis and conclude that DP-MERF is an all-purpose approach. Finally, upon systematizing the work over the past decade, we identify future directions and call for actions from researchers.
CRSep 22, 2021Code
Do I Get the Privacy I Need? Benchmarking Utility in Differential Privacy LibrariesGonzalo Munilla Garrido, Joseph Near, Aitsam Muhammad et al.
An increasing number of open-source libraries promise to bring differential privacy to practice, even for non-experts. This paper studies five libraries that offer differentially private analytics: Google DP, SmartNoise, diffprivlib, diffpriv, and Chorus. We compare these libraries qualitatively (capabilities, features, and maturity) and quantitatively (utility and scalability) across four analytics queries (count, sum, mean, and variance) executed on synthetic and real-world datasets. We conclude that these libraries provide similar utility (except in some notable scenarios). However, there are significant differences in the features provided, and we find that no single library excels in all areas. Based on our results, we provide guidance for practitioners to help in choosing a suitable library, guidance for library designers to enhance their software, and guidance for researchers on open challenges in differential privacy tools for non-experts.
CRJan 11, 2022
Exponential Randomized Response: Boosting Utility in Differentially Private SelectionGonzalo Munilla Garrido, Florian Matthes
A differentially private selection algorithm outputs from a finite set the item that approximately maximizes a data-dependent quality function. The most widely adopted mechanisms tackling this task are the pioneering exponential mechanism and permute-and-flip, which can offer utility improvements of up to a factor of two over the exponential mechanism. This work introduces a new differentially private mechanism for private selection and conducts theoretical and empirical comparisons with the above mechanisms. For reasonably common scenarios, our mechanism can provide utility improvements of factors significantly larger than two over the exponential and permute-and-flip mechanisms. Because the utility can deteriorate in niche scenarios, we recommend our mechanism to analysts who can tolerate lower utility for some datasets.
CRJul 25, 2021
Revealing the Landscape of Privacy-Enhancing Technologies in the Context of Data Markets for the IoT: A Systematic Literature ReviewGonzalo Munilla Garrido, Johannes Sedlmeir, Ömer Uludağ et al.
IoT data markets in public and private institutions have become increasingly relevant in recent years because of their potential to improve data availability and unlock new business models. However, exchanging data in markets bears considerable challenges related to disclosing sensitive information. Despite considerable research focused on different aspects of privacy-enhancing data markets for the IoT, none of the solutions proposed so far seems to find a practical adoption. Thus, this study aims to organize the state-of-the-art solutions, analyze and scope the technologies that have been suggested in this context, and structure the remaining challenges to determine areas where future research is required. To accomplish this goal, we conducted a systematic literature review on privacy enhancement in data markets for the IoT, covering 50 publications dated up to July 2020, and provided updates with 24 publications dated up to May 2022. Our results indicate that most research in this area has emerged only recently, and no IoT data market architecture has established itself as canonical. Existing solutions frequently lack the required combination of anonymization and secure computation technologies. Furthermore, there is no consensus on the appropriate use of blockchain technology for IoT data markets and a low degree of leveraging existing libraries or reusing generic data market architectures. We also identified significant challenges remaining, such as the copy problem and the recursive enforcement problem that-while solutions have been suggested to some extent-are often not sufficiently addressed in proposed designs. We conclude that privacy-enhancing technologies need further improvements to positively impact data markets so that, ultimately, the value of data is preserved through data scarcity and users' privacy and businesses-critical information are protected.