MLLGOCFeb 13, 2024

Corridor Geometry in Gradient-Based Optimization

arXiv:2402.08818v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses optimization stability and efficiency in gradient-based methods for machine learning practitioners, though it is incremental as it builds on known concepts like Polyak step-size.

The paper characterizes loss surface regions called corridors where gradient descent and gradient flow align, leading to linear loss decrease and no implicit regularization or instability. It introduces Corridor Learning Rate (CLR), a learning rate adaptation scheme that matches Polyak step-size, and validates its convergence on CIFAR-10 and ImageNet datasets.

We characterize regions of a loss surface as corridors when the continuous curves of steepest descent -- the solutions of the gradient flow -- become straight lines. We show that corridors provide insights into gradient-based optimization, since corridors are exactly the regions where gradient descent and the gradient flow follow the same trajectory, while the loss decreases linearly. As a result, inside corridors there are no implicit regularization effects or training instabilities that have been shown to occur due to the drift between gradient descent and the gradient flow. Using the loss linear decrease on corridors, we devise a learning rate adaptation scheme for gradient descent; we call this scheme Corridor Learning Rate (CLR). The CLR formulation coincides with a special case of Polyak step-size, discovered in the context of convex optimization. The Polyak step-size has been shown recently to have also good convergence properties for neural networks; we further confirm this here with results on CIFAR-10 and ImageNet.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes