LGMar 9, 2024

DiffRed: Dimensionality Reduction guided by stable rank

arXiv:2403.05882v14 citationsh-index: 2AISTATS
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient and accurate dimensionality reduction for large-scale datasets, offering a novel hybrid method that improves upon existing techniques like PCA and random maps.

The paper tackles the problem of dimensionality reduction by proposing DiffRed, a technique that combines principal component analysis with random projections to achieve tighter theoretical bounds on distortion metrics like Stress and M1, and demonstrates in experiments that it reduces Stress by 54% compared to PCA when mapping a 6 million-dimensional dataset to 10 dimensions.

In this work, we propose a novel dimensionality reduction technique, DiffRed, which first projects the data matrix, A, along first $k_1$ principal components and the residual matrix $A^{*}$ (left after subtracting its $k_1$-rank approximation) along $k_2$ Gaussian random vectors. We evaluate M1, the distortion of mean-squared pair-wise distance, and Stress, the normalized value of RMS of distortion of the pairwise distances. We rigorously prove that DiffRed achieves a general upper bound of $O\left(\sqrt{\frac{1-p}{k_2}}\right)$ on Stress and $O\left(\frac{(1-p)}{\sqrt{k_2*ρ(A^{*})}}\right)$ on M1 where $p$ is the fraction of variance explained by the first $k_1$ principal components and $ρ(A^{*})$ is the stable rank of $A^{*}$. These bounds are tighter than the currently known results for Random maps. Our extensive experiments on a variety of real-world datasets demonstrate that DiffRed achieves near zero M1 and much lower values of Stress as compared to the well-known dimensionality reduction techniques. In particular, DiffRed can map a 6 million dimensional dataset to 10 dimensions with 54% lower Stress than PCA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes