LGDec 4, 2020

Characterization of Excess Risk for Locally Strongly Convex Population Risk

arXiv:2012.02456v43.32 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides a theoretical understanding of generalization for models trained with locally strongly convex population risk, which is relevant for researchers working on the theoretical foundations of machine learning, especially in non-convex optimization.

This paper establishes upper bounds for the expected excess risk of models trained by iterative algorithms, requiring only local strong convexity of the population risk around its local minima. For convex problems, the bound is of order Õ(1/n), and for non-convex problems, it can maintain Õ(1/n) if d/n is below a threshold and no spurious local minima exist, otherwise it is Õ(1/√n).

We establish upper bounds for the expected excess risk of models trained by proper iterative algorithms which approximate the local minima. Unlike the results built upon the strong globally strongly convexity or global growth conditions e.g., PL-inequality, we only require the population risk to be \emph{locally} strongly convex around its local minima. Concretely, our bound under convex problems is of order $\tilde{\cO}(1/n)$. For non-convex problems with $d$ model parameters such that $d/n$ is smaller than a threshold independent of $n$, the order of $\tilde{\cO}(1/n)$ can be maintained if the empirical risk has no spurious local minima with high probability. Moreover, the bound for non-convex problem becomes $\tilde{\cO}(1/\sqrt{n})$ without such assumption. Our results are derived via algorithmic stability and characterization of the empirical risk's landscape. Compared with the existing algorithmic stability based results, our bounds are dimensional insensitive and without restrictions on the algorithm's implementation, learning rate, and the number of iterations. Our bounds underscore that with locally strongly convex population risk, the models trained by any proper iterative algorithm can generalize well, even for non-convex problems, and $d$ is large.

View on arXiv PDF Code

Similar