DS CR LGSep 2, 2020

Differentially private $k$-means clustering via exponential mechanism and max cover

arXiv:2009.01220v17.39 citations

Originality Incremental advance

AI Analysis

This work addresses privacy-preserving clustering for data analysis, offering incremental improvements in error bounds for practical applications.

The paper tackles the problem of differentially private k-means clustering by introducing a new algorithm that reduces additive error, achieving an O(Δ²(k log²n log(1/δ)/ε + k√(d log(1/δ))/ε)) additive error with constant multiplicative error, and experiments show improvement over prior methods.

We introduce a new $(ε_p, δ_p)$-differentially private algorithm for the $k$-means clustering problem. Given a dataset in Euclidean space, the $k$-means clustering problem requires one to find $k$ points in that space such that the sum of squares of Euclidean distances between each data point and its closest respective point among the $k$ returned is minimised. Although there exist privacy-preserving methods with good theoretical guarantees to solve this problem [Balcan et al., 2017; Kaplan and Stemmer, 2018], in practice it is seen that it is the additive error which dictates the practical performance of these methods. By reducing the problem to a sequence of instances of maximum coverage on a grid, we are able to derive a new method that achieves lower additive error then previous works. For input datasets with cardinality $n$ and diameter $Δ$, our algorithm has an $O(Δ^2 (k \log^2 n \log(1/δ_p)/ε_p + k\sqrt{d \log(1/δ_p)}/ε_p))$ additive error whilst maintaining constant multiplicative error. We conclude with some experiments and find an improvement over previously implemented work for this problem.

View on arXiv PDF

Similar