STLGOct 12, 2021

Tangent Space and Dimension Estimation with the Wasserstein Distance

arXiv:2110.06357v414 citations
AI Analysis

This work addresses a fundamental challenge in manifold learning for data analysis, offering theoretical guarantees for practical applications, though it is incremental as it builds on Local PCA with new bounds.

The paper tackles the problem of estimating the dimension and tangent spaces of a smooth manifold from noisy, non-uniformly distributed sample points, providing rigorous bounds on the required number of samples with explicitly described constants.

Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold with high confidence. The algorithm for this estimation is Local PCA, a local version of principal component analysis. Our results accommodate for noisy non-uniform data distribution with the noise that may vary across the manifold, and allow simultaneous estimation at multiple points. Crucially, all of the constants appearing in our bound are explicitly described. The proof uses a matrix concentration inequality to estimate covariance matrices and a Wasserstein distance bound for quantifying nonlinearity of the underlying manifold and non-uniformity of the probability measure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes