LGOCAug 17, 2023

Dual Gauss-Newton Directions for Deep Learning

arXiv:2308.08886v21 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses optimization efficiency for deep learning practitioners, but it is incremental as it builds on existing Gauss-Newton-like methods.

The paper tackles the problem of improving optimization in deep learning by proposing dual Gauss-Newton direction oracles as drop-in replacements for stochastic gradients, demonstrating computational benefits and new insights through empirical study.

Inspired by Gauss-Newton-like methods, we study the benefit of leveraging the structure of deep learning objectives, namely, the composition of a convex loss function and of a nonlinear network, in order to derive better direction oracles than stochastic gradients, based on the idea of partial linearization. In a departure from previous works, we propose to compute such direction oracles via their dual formulation, leading to both computational benefits and new insights. We demonstrate that the resulting oracles define descent directions that can be used as a drop-in replacement for stochastic gradients, in existing optimization algorithms. We empirically study the advantage of using the dual formulation as well as the computational trade-offs involved in the computation of such oracles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes