LGCGQMMLSep 27, 2020

Parametric UMAP embeddings for representation and semi-supervised learning

arXiv:2009.12981v4335 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and scalable embeddings in machine learning, particularly for semi-supervised learning, but it is incremental as it builds on the existing UMAP framework.

The paper tackles the problem of extending UMAP to a parametric method for dimensionality reduction, showing that Parametric UMAP performs comparably to non-parametric UMAP while enabling fast online embeddings for new data, and it improves classifier accuracy in semi-supervised learning by using UMAP as a regularization to capture structure in unlabeled data.

UMAP is a non-parametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) Compute a graphical representation of a dataset (fuzzy simplicial complex), and (2) Through stochastic gradient descent, optimize a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that Parametric UMAP performs comparably to its non-parametric counterpart while conferring the benefit of a learned parametric mapping (e.g. fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semi-supervised learning by capturing structure in unlabeled data. Google Colab walkthrough: https://colab.research.google.com/drive/1WkXVZ5pnMrm17m0YgmtoNjM_XHdnE5Vp?usp=sharing

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes