LGOct 13, 2025

Nonlinear discretizations and Newton's method: characterizing stationary points of regression objectives

arXiv:2510.11987v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses a fundamental issue in optimization for neural networks, providing insights that could influence training strategies and theoretical understanding, though it is incremental in questioning existing assumptions.

The paper investigates the failure of exact second-order methods in neural network training, revealing that using the true Hessian leads to reliable failure, which challenges the common belief that the loss landscape is filled with local minima.

Second-order methods are emerging as promising alternatives to standard first-order optimizers such as gradient descent and ADAM for training neural networks. Though the advantages of including curvature information in computing optimization steps have been celebrated in the scientific machine learning literature, the only second-order methods that have been studied are quasi-Newton, meaning that the Hessian matrix of the objective function is approximated. Though one would expect only to gain from using the true Hessian in place of its approximation, we show that neural network training reliably fails when relying on exact curvature information. The failure modes provide insight both into the geometry of nonlinear discretizations as well as the distribution of stationary points in the loss landscape, leading us to question the conventional wisdom that the loss landscape is replete with local minima.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes