LGAIFeb 25, 2025

SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

arXiv:2502.18153v20.023 citationsh-index: 2ICML
AI Analysis45

This addresses a generalization bottleneck in optimization for deep learning practitioners, though it appears incremental as it builds on existing second-order methods with sharpness-aware modifications.

The paper tackles the problem that approximate second-order optimization methods often generalize worse than first-order methods by finding they converge to sharper minima, and proposes SASSHA, a novel second-order method that explicitly reduces solution sharpness while stabilizing Hessian approximations, achieving generalization performance comparable to or better than other methods in deep learning experiments.

Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods tend to converge to sharper minima compared to SGD. In response, we propose Sassha, a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution, while stabilizing the computation of approximate Hessians along the optimization trajectory. In fact, this sharpness minimization scheme is crafted also to accommodate lazy Hessian updates, so as to secure efficiency besides flatness. To validate its effectiveness, we conduct a wide range of standard deep learning experiments where Sassha demonstrates its outstanding generalization performance that is comparable to, and mostly better than, other methods. We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes