LGNov 16, 2025

CAO: Curvature-Adaptive Optimization via Periodic Low-Rank Hessian Sketching

arXiv:2511.12548v1

Originality Incremental advance

AI Analysis

This addresses the need for faster optimization in deep learning, particularly for non-convex objectives, though it is incremental as it builds on existing Hessian-based preconditioning methods.

The paper tackles the problem of slow convergence of first-order optimizers in sharp, anisotropic regions by proposing a curvature-adaptive method that periodically sketches a low-rank Hessian subspace, resulting in reaching a pre-declared train-loss threshold 2.95x faster than Adam on CIFAR-100/ResNet-18 while matching final test accuracy.

First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector products and preconditions gradients only in that subspace, leaving the orthogonal complement first-order. For L-smooth non-convex objectives, we recover the standard O(1/T) stationarity guarantee with a widened stable stepsize range; under a Polyak--Lojasiewicz (PL) condition with bounded residual curvature outside the sketch, the loss contracts at refresh steps. On CIFAR-10/100 with ResNet-18/34, the method enters the low-loss region substantially earlier: measured by epochs to a pre-declared train-loss threshold (0.75), it reaches the threshold 2.95x faster than Adam on CIFAR-100/ResNet-18, while matching final test accuracy. The approach is one-knob: performance is insensitive to the sketch rank k across {1,3,5}, and k=0 yields a principled curvature-free ablation. We release anonymized logs and scripts that regenerate all figures and tables.

View on arXiv PDF

Similar