LGNov 16, 2025

CAO: Curvature-Adaptive Optimization via Periodic Low-Rank Hessian Sketching

arXiv:2511.12548v1
Originality Incremental advance
AI Analysis

This addresses the need for faster optimization in deep learning, particularly for non-convex objectives, though it is incremental as it builds on existing Hessian-based preconditioning methods.

The paper tackles the problem of slow convergence of first-order optimizers in sharp, anisotropic regions by proposing a curvature-adaptive method that periodically sketches a low-rank Hessian subspace, resulting in reaching a pre-declared train-loss threshold 2.95x faster than Adam on CIFAR-100/ResNet-18 while matching final test accuracy.

First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector products and preconditions gradients only in that subspace, leaving the orthogonal complement first-order. For L-smooth non-convex objectives, we recover the standard O(1/T) stationarity guarantee with a widened stable stepsize range; under a Polyak--Lojasiewicz (PL) condition with bounded residual curvature outside the sketch, the loss contracts at refresh steps. On CIFAR-10/100 with ResNet-18/34, the method enters the low-loss region substantially earlier: measured by epochs to a pre-declared train-loss threshold (0.75), it reaches the threshold 2.95x faster than Adam on CIFAR-100/ResNet-18, while matching final test accuracy. The approach is one-knob: performance is insensitive to the sketch rank k across {1,3,5}, and k=0 yields a principled curvature-free ablation. We release anonymized logs and scripts that regenerate all figures and tables.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes