STLGAug 6, 2013

Empirical entropy, minimax regret and minimax risk

arXiv:1308.1147v386 citations
Originality Highly original
AI Analysis

This work addresses fundamental statistical learning theory problems, providing insights into the equivalence of minimax rates in certain regimes, which is foundational for understanding model misspecification and optimal estimation in machine learning.

The paper tackles the problem of statistical learning in regression models by proposing a method that aggregates empirical minimizers, establishing sharp oracle inequalities for its risk and showing that under certain entropy growth conditions, the excess risk achieves specific rates depending on a parameter p. It concludes that for p in (0,2), the minimax risk and minimax regret rates are equivalent, indicating the same optimal rates for well-specified and misspecified models, while for p>2, minimax regret rates are slower.

We consider the random design regression model with square loss. We propose a method that aggregates empirical minimizers (ERM) over appropriately chosen random subsets and reduces to ERM in the extreme case, and we establish sharp oracle inequalities for its risk. We show that, under the $\varepsilon^{-p}$ growth of the empirical $\varepsilon$-entropy, the excess risk of the proposed method attains the rate $n^{-2/(2+p)}$ for $p\in(0,2)$ and $n^{-1/p}$ for $p>2$ where $n$ is the sample size. Furthermore, for $p\in(0,2)$, the excess risk rate matches the behavior of the minimax risk of function estimation in regression problems under the well-specified model. This yields a conclusion that the rates of statistical estimation in well-specified models (minimax risk) and in misspecified models (minimax regret) are equivalent in the regime $p\in(0,2)$. In other words, for $p\in(0,2)$ the problem of statistical learning enjoys the same minimax rate as the problem of statistical estimation. On the contrary, for $p>2$ we show that the rates of the minimax regret are, in general, slower than for the minimax risk. Our oracle inequalities also imply the $v\log(n/v)/n$ rates for Vapnik-Chervonenkis type classes of dimension $v$ without the usual convexity assumption on the class; we show that these rates are optimal. Finally, for a slightly modified method, we derive a bound on the excess risk of $s$-sparse convex aggregation improving that of Lounici [Math. Methods Statist. 16 (2007) 246-259] and providing the optimal rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes