Every Query Counts: Analyzing the Privacy Loss of Exploratory Data Analyses
This addresses a privacy oversight in data analysis workflows, though it is incremental by focusing on quantifying known risks.
The paper quantifies the privacy loss from basic statistical functions used in exploratory data analysis, showing that ignoring this step can significantly impact the overall privacy budget in machine learning.
An exploratory data analysis is an essential step for every data analyst to gain insights, evaluate data quality and (if required) select a machine learning model for further processing. While privacy-preserving machine learning is on the rise, more often than not this initial analysis is not counted towards the privacy budget. In this paper, we quantify the privacy loss for basic statistical functions and highlight the importance of taking it into account when calculating the privacy-loss budget of a machine learning approach.