STMLSep 11, 2017

Manifold Learning Using Kernel Density Estimation and Local Principal Components Analysis

arXiv:1709.03615v121 citations
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for manifold learning algorithms, addressing a gap where existing methods often fail to guarantee manifold properties, which is important for researchers in dimensionality reduction and geometric data analysis.

The paper tackles the problem of recovering a d-dimensional manifold from noiseless samples by proposing two approximate squared-distance functions based on kernel density estimation and local PCA, which provably yield a manifold estimate arbitrarily close to the true manifold in Hausdorff distance.

We consider the problem of recovering a $d-$dimensional manifold $\mathcal{M} \subset \mathbb{R}^n$ when provided with noiseless samples from $\mathcal{M}$. There are many algorithms (e.g., Isomap) that are used in practice to fit manifolds and thus reduce the dimensionality of a given data set. Ideally, the estimate $\mathcal{M}_\mathrm{put}$ of $\mathcal{M}$ should be an actual manifold of a certain smoothness; furthermore, $\mathcal{M}_\mathrm{put}$ should be arbitrarily close to $\mathcal{M}$ in Hausdorff distance given a large enough sample. Generally speaking, existing manifold learning algorithms do not meet these criteria. Fefferman, Mitter, and Narayanan (2016) have developed an algorithm whose output is provably a manifold. The key idea is to define an approximate squared-distance function (asdf) to $\mathcal{M}$. Then, $\mathcal{M}_\mathrm{put}$ is given by the set of points where the gradient of the asdf is orthogonal to the subspace spanned by the largest $n - d$ eigenvectors of the Hessian of the asdf. As long as the asdf meets certain regularity conditions, $\mathcal{M}_\mathrm{put}$ is a manifold that is arbitrarily close in Hausdorff distance to $\mathcal{M}$. In this paper, we define two asdfs that can be calculated from the data and show that they meet the required regularity conditions. The first asdf is based on kernel density estimation, and the second is based on estimation of tangent spaces using local principal components analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes