Filtrated Algebraic Subspace Clustering
This addresses a limitation in algebraic subspace clustering for researchers and practitioners, offering a more flexible method, though it is incremental as it builds on prior algebraic-geometric approaches.
The paper tackles the problem of subspace clustering for data lying near a union of linear subspaces, proposing a provably correct algorithm based on filtrations that handles arbitrary-dimensional subspaces without requiring equal dimensions or known numbers of subspaces, and demonstrates through experiments on synthetic and real data that it outperforms state-of-the-art methods in some cases.
Subspace clustering is the problem of clustering data that lie close to a union of linear subspaces. In the abstract form of the problem, where no noise or other corruptions are present, the data are assumed to lie in general position inside the algebraic variety of a union of subspaces, and the objective is to decompose the variety into its constituent subspaces. Prior algebraic-geometric approaches to this problem require the subspaces to be of equal dimension, or the number of subspaces to be known. Subspaces of arbitrary dimensions can still be recovered in closed form, in terms of all homogeneous polynomials of degree $m$ that vanish on their union, when an upper bound m on the number of the subspaces is given. In this paper, we propose an alternative, provably correct, algorithm for addressing a union of at most $m$ arbitrary-dimensional subspaces, based on the idea of descending filtrations of subspace arrangements. Our algorithm uses the gradient of a vanishing polynomial at a point in the variety to find a hyperplane containing the subspace S passing through that point. By intersecting the variety with this hyperplane, we obtain a subvariety that contains S, and recursively applying the procedure until no non-trivial vanishing polynomial exists, our algorithm eventually identifies S. By repeating this procedure for other points, our algorithm eventually identifies all the subspaces by returning a basis for their orthogonal complement. Finally, we develop a variant of the abstract algorithm, suitable for computations with noisy data. We show by experiments on synthetic and real data that the proposed algorithm outperforms state-of-the-art methods on several occasions, thus demonstrating the merit of the idea of filtrations.