Clustering, multicollinearity, and singular vectors
This work addresses multicollinearity issues in data analysis, providing a theoretical foundation for identifying redundant features in supervised and unsupervised learning tasks, but it appears incremental as it builds on existing matrix theory.
The paper tackles the problem of identifying redundant columns in a matrix by proving that, after reordering, the matrix S = I - A†A has a block-diagonal form corresponding to linearly dependent columns, with applications in feature selection, clustering, and sensitivity analysis.
Let $A$ be a matrix with its pseudo-matrix $A^{\dagger}$ and set $S=I-A^{\dagger}A$. We prove that, after re-ordering the columns of $A$, the matrix $S$ has a block-diagonal form where each block corresponds to a set of linearly dependent columns. This allows us to identify redundant columns in $A$. We explore some applications in supervised and unsupervised learning, specially feature selection, clustering, and sensitivity of solutions of least squares solutions.