PRSTMLJan 31, 2022

A Probabilistic Graph Coupling View of Dimension Reduction

arXiv:2201.13053v317 citations
AI Analysis

This work provides a foundational framework for dimension reduction, potentially enhancing interpretability and performance for researchers and practitioners in data science and machine learning.

The authors tackled the lack of probabilistic foundations in popular dimension reduction methods like t-SNE and UMAP, introducing a unifying statistical framework based on graph coupling that reveals a statistical deficiency in existing methods and addresses it to improve performance.

Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover this reveals that these methods suffer from a statistical deficiency that explains poor performances in conserving coarse-grain dependencies. Our model is leveraged and extended to address this issue while new links are drawn with Laplacian eigenmaps and PCA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes