Global Convergence and Error Propagation in Neural Gradient Flows: A Riemannian Optimization Framework

arXiv:2605.2777995.9h-index: 3

AI Analysis

This work provides a theoretical convergence guarantee for neural network optimization using Riemannian geometry, which is significant for understanding and improving optimization in deep learning, though the assumptions (e.g., C^2 network, non-degenerate Jacobian) are restrictive.

The paper develops a geometric convergence theory for neural-network optimization, proving that under certain conditions the Riemannian gradient flow and its discrete counterpart converge linearly to the unique minimizer, and that inexact neural iterates converge to an O(δ)-neighborhood of the global minimum. Numerical experiments show Gauss-Newton-type inner solvers achieve smaller trajectory errors with fewer iterations than first-order methods.

We develop a geometric convergence theory for neural-network optimization within the minimizing movement scheme (MMS) framework. Reformulating each neural MMS step as a minimization over the set of increments in a Hilbert space, we show that under a $C^2$ network with locally non-degenerate Jacobian this increment set is a boundaryless smooth embedded submanifold, on which a natural preconditioned (Gauss--Newton-type) gradient flow in parameter space induces exactly the Riemannian gradient flow. Under a strict interior-localization condition and an explicit data condition, the reached sublevel set is geodesically convex and the subproblem objective is geodesically strongly convex on it; both the continuous Riemannian gradient flow and its discrete companion via the exponential map converge linearly to the unique subproblem minimizer. Propagating finite-time inner-solver inexactness and neural-approximation error through the MMS iterations yields a uniform function-space tracking bound and an explicit trajectory budget, so the inexact neural iterates converge to an $O(δ)$-neighborhood of the global minimum. Numerical experiments on nonlinear regression and a small-scale latent-diffusion testbed indicate that the Gauss--Newton-type inner solver achieves smaller trajectory errors with substantially fewer inner iterations than first-order baselines.

View on arXiv PDF

Similar