Subspace Clustering of Subspaces: Unifying Canonical Correlation Analysis and Subspace Clustering
This addresses a problem in high-dimensional data analysis where structure exists beyond individual vectors, though it appears incremental as it builds on and unifies existing subspace clustering and canonical correlation analysis methods.
The paper tackles the problem of clustering collections of tall matrices based on their column spaces, introducing a framework called Subspace Clustering of Subspaces (SCoS) that directly models data samples as matrices rather than vectorized data. It achieves superior clustering accuracy and robustness on real-world hyperspectral imaging datasets, especially under high noise and interference, compared to existing subspace clustering techniques.
We introduce a novel framework for clustering a collection of tall matrices based on their column spaces, a problem we term Subspace Clustering of Subspaces (SCoS). Unlike traditional subspace clustering methods that assume vectorized data, our formulation directly models each data sample as a matrix and clusters them according to their underlying subspaces. We establish conceptual links to Subspace Clustering and Generalized Canonical Correlation Analysis (GCCA), and clarify key differences that arise in this more general setting. Our approach is based on a Block Term Decomposition (BTD) of a third-order tensor constructed from the input matrices, enabling joint estimation of cluster memberships and partially shared subspaces. We provide the first identifiability results for this formulation and propose scalable optimization algorithms tailored to large datasets. Experiments on real-world hyperspectral imaging datasets demonstrate that our method achieves superior clustering accuracy and robustness, especially under high noise and interference, compared to existing subspace clustering techniques. These results highlight the potential of the proposed framework in challenging high-dimensional applications where structure exists beyond individual data vectors.