Intrinsic Dimension Estimation Using Wasserstein Distances
This work addresses the challenge of understanding low-dimensional structure in data for machine learning practitioners, with incremental improvements in estimation and GAN analysis.
The paper tackles the problem of estimating the intrinsic dimension of high-dimensional data from finite samples, introducing a new estimator with finite sample guarantees and applying it to derive sample complexity bounds for GANs that depend only on the intrinsic dimension.
It has long been thought that high-dimensional data encountered in many practical machine learning tasks have low-dimensional structure, i.e., the manifold hypothesis holds. A natural question, thus, is to estimate the intrinsic dimension of a given population distribution from a finite sample. We introduce a new estimator of the intrinsic dimension and provide finite sample, non-asymptotic guarantees. We then apply our techniques to get new sample complexity bounds for Generative Adversarial Networks (GANs) depending only on the intrinsic dimension of the data.