Directional Consistency as a Complementary Optimization Signal: The GONO Framework
For deep learning practitioners, this work introduces a novel optimization signal (directional consistency) that can improve training dynamics, though the empirical gains are incremental compared to AdamW.
The paper identifies that directional alignment and loss convergence can be decoupled in deep learning optimization, and proposes GONO, an optimizer that adapts momentum based on gradient directional consistency. GONO matches Adam's convergence rate and achieves competitive performance on MNIST (98.15%), CIFAR-10 (43.14%), and ResNet-18 (75.44%), while detecting oscillations with F1=1.00.
We identify and formalize an underexplored phenomenon in deep learning optimization: directional alignment and loss convergence can be decoupled. An optimizer can exhibit near-perfect directional consistency (cc_t -> 1, measured via consecutive gradient cosine similarity) while the loss remains high or decreases slowly. This observation reveals that existing optimizers such as Adam, SGD, and RMSprop lack explicit mechanisms to exploit temporal consistency in gradient directions, relying instead on magnitude-based signals that fail to distinguish plateaus, saddle points, and genuine convergence. Motivated by this, we introduce GONO (Gradient-Oriented Norm-Adaptive Optimizer), which adapts Adam's momentum coefficient beta_1 based on cc_t: amplifying momentum under directional consistency and suppressing it during oscillation. We prove GONO matches Adam's O(1/sqrt(T)) convergence rate and reduces exactly to Adam when the signal is uninformative. Empirically, cc_t achieves oscillation detection with F1=1.00 (vs. 0.45 for gradient norm), and GONO remains competitive with AdamW on MNIST (98.15%), CIFAR-10 (43.14%), and ResNet-18 (75.44%), establishing directional alignment as a theoretically grounded, practically actionable optimization signal. Code: https://github.com/victordaniel/gono-optimizer