CRDBJul 4, 2020

PPaaS: Privacy Preservation as a Service

arXiv:2007.02013v312 citations
AI Analysis

This addresses privacy concerns in data sharing for machine learning and analytics, though it appears incremental as it builds on existing perturbation methods.

The paper tackles the challenge of selecting the best privacy-preservation approach for data sharing by proposing PPaaS, a framework that uses selective data perturbation to balance privacy and utility, resulting in a generic platform for sanitizing big data with application-specific granularity.

Personally identifiable information (PII) can find its way into cyberspace through various channels, and many potential sources can leak such information. Data sharing (e.g. cross-agency data sharing) for machine learning and analytics is one of the important components in data science. However, due to privacy concerns, data should be enforced with strong privacy guarantees before sharing. Different privacy-preserving approaches were developed for privacy preserving data sharing; however, identifying the best privacy-preservation approach for the privacy-preservation of a certain dataset is still a challenge. Different parameters can influence the efficacy of the process, such as the characteristics of the input dataset, the strength of the privacy-preservation approach, and the expected level of utility of the resulting dataset (on the corresponding data mining application such as classification). This paper presents a framework named \underline{P}rivacy \underline{P}reservation \underline{a}s \underline{a} \underline{S}ervice (PPaaS) to reduce this complexity. The proposed method employs selective privacy preservation via data perturbation and looks at different dynamics that can influence the quality of the privacy preservation of a dataset. PPaaS includes pools of data perturbation methods, and for each application and the input dataset, PPaaS selects the most suitable data perturbation approach after rigorous evaluation. It enhances the usability of privacy-preserving methods within its pool; it is a generic platform that can be used to sanitize big data in a granular, application-specific manner by employing a suitable combination of diverse privacy-preserving algorithms to provide a proper balance between privacy and utility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes