LG MLMay 17, 2025

Accelerating Neural Network Training Along Sharp and Flat Directions

ETH Zurich

arXiv:2505.11972v14.1h-index: 11

Originality Incremental advance

AI Analysis

This work addresses the design of curvature-aware optimizers for machine learning practitioners, but it is incremental as it builds on existing subspace decomposition methods.

The paper tackled the problem of accelerating neural network training by analyzing updates along sharp and flat directions in the Hessian spectrum, showing that restricting updates to flatter directions can speed up convergence but may reduce stability, leading to the introduction of interpolated gradient methods to balance these effects.

Recent work has highlighted a surprising alignment between gradients and the top eigenspace of the Hessian -- termed the Dominant subspace -- during neural network training. Concurrently, there has been growing interest in the distinct roles of sharp and flat directions in the Hessian spectrum. In this work, we study Bulk-SGD, a variant of SGD that restricts updates to the orthogonal complement of the Dominant subspace. Through ablation studies, we characterize the stability properties of Bulk-SGD and identify critical hyperparameters that govern its behavior. We show that updates along the Bulk subspace, corresponding to flatter directions in the loss landscape, can accelerate convergence but may compromise stability. To balance these effects, we introduce interpolated gradient methods that unify SGD, Dom-SGD, and Bulk-SGD. Finally, we empirically connect this subspace decomposition to the Generalized Gauss-Newton and Functional Hessian terms, showing that curvature energy is largely concentrated in the Dominant subspace. Our findings suggest a principled approach to designing curvature-aware optimizers.

View on arXiv PDF

Similar