STLGMay 11, 2021

Non-Parametric Estimation of Manifolds from Noisy Data

arXiv:2105.04754v222 citations
Originality Incremental advance
AI Analysis

This work addresses a fundamental challenge in manifold learning for data science applications, providing theoretical guarantees for non-parametric estimation with noisy samples.

The paper tackles the problem of estimating a low-dimensional manifold and its tangent space from noisy high-dimensional data, proving that their algorithm achieves optimal asymptotic convergence rates of n^{-k/(2k+d)} for point estimation and n^{-(k-1)/(2k+d)} for tangent space estimation.

A common observation in data-driven applications is that high dimensional data has a low intrinsic dimension, at least locally. In this work, we consider the problem of estimating a $d$ dimensional sub-manifold of $\mathbb{R}^D$ from a finite set of noisy samples. Assuming that the data was sampled uniformly from a tubular neighborhood of $\mathcal{M}\in \mathcal{C}^k$, a compact manifold without boundary, we present an algorithm that takes a point $r$ from the tubular neighborhood and outputs $\hat p_n\in \mathbb{R}^D$, and $\widehat{T_{\hat p_n}\mathcal{M}}$ an element in the Grassmanian $Gr(d, D)$. We prove that as the number of samples $n\to\infty$ the point $\hat p_n$ converges to $p\in \mathcal{M}$ and $\widehat{T_{\hat p_n}\mathcal{M}}$ converges to $T_p\mathcal{M}$ (the tangent space at that point) with high probability. Furthermore, we show that the estimation yields asymptotic rates of convergence of $n^{-\frac{k}{2k + d}}$ for the point estimation and $n^{-\frac{k-1}{2k + d}}$ for the estimation of the tangent space. These rates are known to be optimal for the case of function estimation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes