Theoretical Analysis of Sparse Subspace Clustering with Missing Entries
This addresses a theoretical gap for researchers in machine learning and computer vision, but it is incremental as it extends existing SSC analysis to incomplete data.
The paper tackles the problem of Sparse Subspace Clustering (SSC) with missing data by providing theoretical guarantees and showing that projecting zero-filled data onto the observation pattern improves performance, with analytical establishment of this improvement.
Sparse Subspace Clustering (SSC) is a popular unsupervised machine learning method for clustering data lying close to an unknown union of low-dimensional linear subspaces; a problem with numerous applications in pattern recognition and computer vision. Even though the behavior of SSC for complete data is by now well-understood, little is known about its theoretical properties when applied to data with missing entries. In this paper we give theoretical guarantees for SSC with incomplete data, and analytically establish that projecting the zero-filled data onto the observation pattern of the point being expressed leads to a substantial improvement in performance. The main insight that stems from our analysis is that even though the projection induces additional missing entries, this is counterbalanced by the fact that the projected and zero-filled data are in effect incomplete points associated with the union of the corresponding projected subspaces, with respect to which the point being expressed is complete. The significance of this phenomenon potentially extends to the entire class of self-expressive methods.