STLGJun 16, 2023

Adversarially robust clustering with optimality guarantees

arXiv:2306.09977v34 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses the vulnerability of optimal clustering methods to outliers, providing robust guarantees for applications where data integrity is critical.

The paper tackles the problem of clustering data from sub-Gaussian mixtures with adversarial outliers, proposing a robust algorithm based on the coordinatewise median that achieves the optimal mislabeling error rate in constant iterations under weak initialization.

We consider the problem of clustering data points coming from sub-Gaussian mixtures. Existing methods that provably achieve the optimal mislabeling error, such as the Lloyd algorithm, are usually vulnerable to outliers. In contrast, clustering methods seemingly robust to adversarial perturbations are not known to satisfy the optimal statistical guarantees. We propose a simple robust algorithm based on the coordinatewise median that obtains the optimal mislabeling rate even when we allow adversarial outliers to be present. Our algorithm achieves the optimal error rate in constant iterations when a weak initialization condition is satisfied. In the absence of outliers, in fixed dimensions, our theoretical guarantees are similar to that of the Lloyd algorithm. Extensive experiments on various simulated and public datasets are conducted to support the theoretical guarantees of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes