Robust Unsupervised Learning via L-Statistic Minimization
This work is significant for researchers and practitioners in unsupervised learning who need robust algorithms against data distribution perturbations, offering an incremental improvement to existing methods.
This paper addresses the problem of designing unsupervised learning algorithms robust to data distribution perturbations. It proposes a general descent algorithm that minimizes an L-statistic criterion, weighting smaller losses more, and demonstrates its effectiveness on k-means clustering and principal subspace analysis.
Designing learning algorithms that are resistant to perturbations of the underlying data distribution is a problem of wide practical and theoretical importance. We present a general approach to this problem focusing on unsupervised learning. The key assumption is that the perturbing distribution is characterized by larger losses relative to a given class of admissible models. This is exploited by a general descent algorithm which minimizes an $L$-statistic criterion over the model class, weighting small losses more. Our analysis characterizes the robustness of the method in terms of bounds on the reconstruction error relative to the underlying unperturbed distribution. As a byproduct, we prove uniform convergence bounds with respect to the proposed criterion for several popular models in unsupervised learning, a result which may be of independent interest.Numerical experiments with kmeans clustering and principal subspace analysis demonstrate the effectiveness of our approach.