LGPRMLApr 11, 2025

In almost all shallow analytic neural network optimization landscapes, efficient minimizers have strongly convex neighborhoods

arXiv:2504.08867v1h-index: 3
Originality Incremental advance
AI Analysis

This provides theoretical insights into the convergence behavior of optimizers for shallow neural networks, which is incremental but clarifies conditions for efficient training.

The paper analyzes the optimization landscape of shallow neural networks with analytic activation functions for regression, showing that in almost all regression problems, local minima in the efficient domain are strongly convex, which influences optimizer convergence rates.

Whether or not a local minimum of a cost function has a strongly convex neighborhood greatly influences the asymptotic convergence rate of optimizers. In this article, we rigorously analyze the prevalence of this property for the mean squared error induced by shallow, 1-hidden layer neural networks with analytic activation functions when applied to regression problems. The parameter space is divided into two domains: the 'efficient domain' (all parameters for which the respective realization function cannot be generated by a network having a smaller number of neurons) and the 'redundant domain' (the remaining parameters). In almost all regression problems on the efficient domain the optimization landscape only features local minima that are strongly convex. Formally, we will show that for certain randomly picked regression problems the optimization landscape is almost surely a Morse function on the efficient domain. The redundant domain has significantly smaller dimension than the efficient domain and on this domain, potential local minima are never isolated.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes