Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories
This work addresses the challenge of high-dimensional data analysis in machine learning by relaxing strict manifold assumptions, which is significant for practitioners dealing with real-world datasets that often deviate from ideal low-dimensional structures.
This paper tackles the problem of deep nonparametric regression when data are not exactly on a low-dimensional manifold, by introducing the effective Minkowski dimension to characterize data complexity. It proves that sample complexity depends only on this dimension, with examples showing it scales as O(√log n) or O(n^γ) for Gaussian designs, enabling adaptation to intrinsic structures and circumventing the curse of dimensionality.
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures. In real world applications, such an assumption of data lying exactly on a low dimensional manifold is stringent. This paper introduces a relaxed assumption that the input data are concentrated around a subset of $\mathbb{R}^d$ denoted by $\mathcal{S}$, and the intrinsic dimension of $\mathcal{S}$ can be characterized by a new complexity notation -- effective Minkowski dimension. We prove that, the sample complexity of deep nonparametric regression only depends on the effective Minkowski dimension of $\mathcal{S}$ denoted by $p$. We further illustrate our theoretical findings by considering nonparametric regression with an anisotropic Gaussian random design $N(0,Σ)$, where $Σ$ is full rank. When the eigenvalues of $Σ$ have an exponential or polynomial decay, the effective Minkowski dimension of such an Gaussian random design is $p=\mathcal{O}(\sqrt{\log n})$ or $p=\mathcal{O}(n^γ)$, respectively, where $n$ is the sample size and $γ\in(0,1)$ is a small constant depending on the polynomial decay rate. Our theory shows that, when the manifold assumption does not hold, deep neural networks can still adapt to the effective Minkowski dimension of the data, and circumvent the curse of the ambient dimensionality for moderate sample sizes.