CA-PCA: Manifold Dimension Estimation, Adapted for Curvature
This work addresses a specific bottleneck in data analysis for researchers and practitioners dealing with high-dimensional datasets, offering an incremental improvement over existing methods.
The paper tackles the problem of manifold dimension estimation in high-dimensional data by developing CA-PCA, a method adapted for curvature, which improves estimation accuracy across various settings.
The success of algorithms in the analysis of high-dimensional data is often attributed to the manifold hypothesis, which supposes that this data lie on or near a manifold of much lower dimension. It is often useful to determine or estimate the dimension of this manifold before performing dimension reduction, for instance. Existing methods for dimension estimation are calibrated using a flat unit ball. In this paper, we develop CA-PCA, a version of local PCA based instead on a calibration of a quadratic embedding, acknowledging the curvature of the underlying manifold. Numerous careful experiments show that this adaptation improves the estimator in a wide range of settings.