LGAINEMLApr 22, 2025

AlphaGrad: Non-Linear Gradient Normalization Optimizer

arXiv:2504.16020v21 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses memory-constrained optimization problems in reinforcement learning, offering an incremental improvement with context-dependent benefits.

The paper tackles the memory overhead and hyperparameter complexity of adaptive optimizers like Adam by introducing AlphaGrad, a non-linear gradient normalization optimizer, which shows enhanced stability and superior performance in on-policy PPO but requires careful tuning and exhibits instability in off-policy DQN.

We introduce AlphaGrad, a memory-efficient, conditionally stateless optimizer addressing the memory overhead and hyperparameter complexity of adaptive methods like Adam. AlphaGrad enforces scale invariance via tensor-wise L2 gradient normalization followed by a smooth hyperbolic tangent transformation, $g' = \tanh(α\cdot \tilde{g})$, controlled by a single steepness parameter $α$. Our contributions include: (1) the AlphaGrad algorithm formulation; (2) a formal non-convex convergence analysis guaranteeing stationarity; (3) extensive empirical evaluation on diverse RL benchmarks (DQN, TD3, PPO). Compared to Adam, AlphaGrad demonstrates a highly context-dependent performance profile. While exhibiting instability in off-policy DQN, it provides enhanced training stability with competitive results in TD3 (requiring careful $α$ tuning) and achieves substantially superior performance in on-policy PPO. These results underscore the critical importance of empirical $α$ selection, revealing strong interactions between the optimizer's dynamics and the underlying RL algorithm. AlphaGrad presents a compelling alternative optimizer for memory-constrained scenarios and shows significant promise for on-policy learning regimes where its stability and efficiency advantages can be particularly impactful.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes