LGJan 19, 2023

Convergence beyond the over-parameterized regime using Rayleigh quotients

arXiv:2301.08117v15 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses the theoretical challenge of convergence proofs in deep learning for researchers, offering a unified approach that is incremental but extends applicability to test loss minimization.

The paper tackles the problem of proving convergence of deep learning architectures to zero training or testing loss using gradient flow, by introducing a strategy based on Rayleigh quotients to establish Kurdyka-Łojasiewicz inequalities for a broader set of architectures and loss functions, extending analysis beyond the over-parameterized regime without requiring infinite parameters or finite samples.

In this paper, we present a new strategy to prove the convergence of deep learning architectures to a zero training (or even testing) loss by gradient flow. Our analysis is centered on the notion of Rayleigh quotients in order to prove Kurdyka-Łojasiewicz inequalities for a broader set of neural network architectures and loss functions. We show that Rayleigh quotients provide a unified view for several convergence analysis techniques in the literature. Our strategy produces a proof of convergence for various examples of parametric learning. In particular, our analysis does not require the number of parameters to tend to infinity, nor the number of samples to be finite, thus extending to test loss minimization and beyond the over-parameterized regime.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes