Developing Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering
This work addresses privacy preservation for data release in scenarios where statistical assumptions are not applicable, offering a method for non-stochastic settings, though it appears incremental as it builds on existing privacy measures and clustering techniques.
The paper tackles the problem of minimizing privacy leakage of sensitive information from publicly accessible data in a non-stochastic setting by generating quantized data, using agglomerative clustering algorithms to achieve locally optimal solutions for two privacy measures. It shows that the maximin information can be reduced by merging nodes in a confusability graph, relating this to probabilistic information-theoretic privacy.
We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information $S$ from publicly accessible data $X$ without using statistics. We consider the problem of generating and releasing a quantization $\hat{X}$ of $X$ to minimize the privacy leakage of $S$ to $\hat{X}$ while maintaining a certain level of utility (or, inversely, the quantization loss). The variables $S$ and $S$ are treated as bounded and non-probabilistic, but are otherwise general. We consider two existing non-stochastic privacy measures, namely the maximum uncertainty reduction $L_0(S \rightarrow \hat{X})$ and the refined information $I_*(S; \hat{X})$ (also called the maximin information) of $S$. For each privacy measure, we propose a corresponding agglomerative clustering algorithm that converges to a locally optimal quantization solution $\hat{X}$ by iteratively merging elements in the alphabet of $X$. To instantiate the solution to this problem, we consider two specific utility measures, the worst-case resolution of $X$ by observing $\hat{X}$ and the maximal distortion of the released data $\hat{X}$. We show that the value of the maximin information $I_*(S; \hat{X})$ can be determined by dividing the confusability graph into connected subgraphs. Hence, $I_*(S; \hat{X})$ can be reduced by merging nodes connecting subgraphs. The relation to the probabilistic information-theoretic privacy is also studied by noting that the G{á}cs-K{ö}rner common information is the stochastic version of $I_*$ and indicates the attainability of statistical indistinguishability.