Greedy Alignment Principle for Optimizer Selection
For practitioners training neural networks, this work provides a principled method to dynamically select optimizer hyperparameters, reducing manual tuning effort while maintaining or improving performance.
The paper introduces the Greedy Alignment Principle (GAP) for optimizer selection, which maximizes the expected loss drop rate by treating optimizers as causal filters. Experiments across image classification, language model fine-tuning, and vision transformer fine-tuning show that dynamic momentum rules derived from GAP match or improve upon best fixed hyperparameters from manual sweeps, reducing the need for exhaustive tuning.
Recent works have shown that gradient-update alignment is a powerful signal for modulating optimizer updates, often leading to faster training. We promote this update-wise heuristic as a mathematically grounded principle for selecting and tuning optimizer hyperparameters. By treating gradients and updates as signals and an optimizer as a causal filter that maps between them, we formulate optimizer selection as maximizing the expected drop rate in loss over a prescribed family of optimizers. We show that this objective is exactly the inner product between the optimizer filter and the gradient autocorrelation, and prove that a greedy optimum exists and has a stability bound under perturbations of the estimated gradient statistics. Specializing in momentum-based optimizers, the theory yields simple dynamic momentum selection rules for both SGD+Momentum and Adam/AdamW. Experiments across image classification, language model fine-tuning, and vision transformer fine-tuning show that the resulting dynamic momentum rules match or improve upon the best fixed hyperparameters found via manual sweeps, reducing the need for exhaustive momentum sweeps. Code is available at https://github.com/ironjr/gap