Phase transitions in nonparametric regressions
This provides theoretical insights into regression performance limits for statisticians and machine learning researchers, though it is incremental in refining existing bounds.
This paper tackles the problem of determining minimax optimal rates for nonparametric regression with smooth functions, showing that the rate transitions from (log n)/(n log(log n)) to (1/n)^((2γ+2)/(2γ+3)) depending on sample size n relative to smoothness γ, with optimal smoothness exploitation varying accordingly.
When the unknown regression function of a single variable is known to have derivatives up to the $(γ+1)$th order bounded in absolute values by a common constant everywhere or a.e. (i.e., $(γ+1)$th degree of smoothness), the minimax optimal rate of the mean integrated squared error (MISE) is stated as $\left(\frac{1}{n}\right)^{\frac{2γ+2}{2γ+3}}$ in the literature. This paper shows that: (i) if $n\leq\left(γ+1\right)^{2γ+3}$, the minimax optimal MISE rate is $\frac{\log n}{n\log(\log n)}$ and the optimal degree of smoothness to exploit is roughly $\max\left\{ \left\lfloor \frac{\log n}{2\log\left(\log n\right)}\right\rfloor ,\,1\right\} $; (ii) if $n>\left(γ+1\right)^{2γ+3}$, the minimax optimal MISE rate is $\left(\frac{1}{n}\right)^{\frac{2γ+2}{2γ+3}}$ and the optimal degree of smoothness to exploit is $γ+1$. The fundamental contribution of this paper is a set of metric entropy bounds we develop for smooth function classes. Some of our bounds are original, and some of them improve and/or generalize the ones in the literature (e.g., Kolmogorov and Tikhomirov, 1959). Our metric entropy bounds allow us to show phase transitions in the minimax optimal MISE rates associated with some commonly seen smoothness classes as well as non-standard smoothness classes, and can also be of independent interest outside the nonparametric regression problems.