Subspace Clustering via Thresholding and Spectral Clustering
This addresses subspace clustering for high-dimensional data analysis, offering a low-complexity solution with theoretical guarantees, but it is incremental as it builds on existing spectral clustering methods.
The paper tackles the problem of clustering high-dimensional data points into unknown low-dimensional linear subspaces, proposing a simple algorithm based on thresholding correlations and spectral clustering that succeeds even with intersecting subspaces and erasures, with dimensions scaling linearly in ambient dimension up to a log-factor.
We consider the problem of clustering a set of high-dimensional data points into sets of low-dimensional linear subspaces. The number of subspaces, their dimensions, and their orientations are unknown. We propose a simple and low-complexity clustering algorithm based on thresholding the correlations between the data points followed by spectral clustering. A probabilistic performance analysis shows that this algorithm succeeds even when the subspaces intersect, and when the dimensions of the subspaces scale (up to a log-factor) linearly in the ambient dimension. Moreover, we prove that the algorithm also succeeds for data points that are subject to erasures with the number of erasures scaling (up to a log-factor) linearly in the ambient dimension. Finally, we propose a simple scheme that provably detects outliers.