LGCRDSJan 31, 2023

Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees

arXiv:2302.00037v210 citationsh-index: 62
Originality Highly original
AI Analysis

This work addresses privacy-preserving data analysis for hierarchical clustering, offering theoretical guarantees that are foundational for applications in sensitive domains, though it is incremental in extending differential privacy to this specific clustering framework.

The paper tackles the problem of performing hierarchical clustering under differential privacy, establishing strong lower bounds for additive error and providing both polynomial-time and exponential-time algorithms with provable approximation guarantees, including a near-optimal algorithm for graphs modeled by the stochastic block model.

Hierarchical Clustering is a popular unsupervised machine learning method with decades of history and numerous applications. We initiate the study of differentially private approximation algorithms for hierarchical clustering under the rigorous framework introduced by (Dasgupta, 2016). We show strong lower bounds for the problem: that any $ε$-DP algorithm must exhibit $O(|V|^2/ ε)$-additive error for an input dataset $V$. Then, we exhibit a polynomial-time approximation algorithm with $O(|V|^{2.5}/ ε)$-additive error, and an exponential-time algorithm that meets the lower bound. To overcome the lower bound, we focus on the stochastic block model, a popular model of graphs, and, with a separation assumption on the blocks, propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly. Finally, we perform an empirical study of our algorithms and validate their performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes