Asymptotic Smoothing of the Lipschitz Loss Landscape in Overparameterized One-Hidden-Layer ReLU Networks
This addresses the problem of non-convex optimization in neural networks for researchers, showing that overparameterization can mitigate local minima issues, though it is incremental as it builds on prior work for specific network types.
The paper proves that overparameterized one-hidden-layer ReLU networks with convex Lipschitz losses and ℓ₁ regularization have a loss landscape where any two models at the same loss level can be connected by a path with arbitrarily small loss increase, and it shows that the energy gap between local and global minima vanishes as width increases, making the landscape asymptotically smooth. Empirical results on synthetic and real datasets confirm that wider networks have smaller energy gaps, with a permutation test yielding p=0, indicating reduced barriers.
We study the topology of the loss landscape of one-hidden-layer ReLU networks under overparameterization. On the theory side, we (i) prove that for convex $L$-Lipschitz losses with an $\ell_1$-regularized second layer, every pair of models at the same loss level can be connected by a continuous path within an arbitrarily small loss increase $ε$ (extending a known result for the quadratic loss); (ii) obtain an asymptotic upper bound on the energy gap $ε$ between local and global minima that vanishes as the width $m$ grows, implying that the landscape flattens and sublevel sets become connected in the limit. Empirically, on a synthetic Moons dataset and on the Wisconsin Breast Cancer dataset, we measure pairwise energy gaps via Dynamic String Sampling (DSS) and find that wider networks exhibit smaller gaps; in particular, a permutation test on the maximum gap yields $p_{perm}=0$, indicating a clear reduction in the barrier height.