LGOct 2, 2025

Gradient Shaping Beyond Clipping: A Functional Perspective on Update Magnitude Control

arXiv:2510.01578v13 citationsh-index: 3MMAsia

Originality Highly original

AI Analysis

This work addresses the need for more adaptive gradient control methods in machine learning, offering a principled alternative to rigid heuristics for researchers and practitioners.

The paper tackles the problem of gradient clipping's inflexibility in deep network training by proposing SPAMP, a framework for smooth, per-layer gradient shaping that improves stability, convergence, and robustness across image and language tasks.

Gradient clipping is widely used to stabilize deep network training, but its formulation as a hard, fixed threshold limits flexibility and ignores gradient distribution dynamics. We propose SPAMP (Statistical Per-layer Adaptive Modulation and Projection), a unified framework that generalizes clipping into smooth, per-layer gradient shaping. SPAMP tracks local gradient statistics, dynamically estimates thresholds, and applies power-based transformations to modulate update magnitudes in a differentiable manner. This perspective recasts clipping and warmup as dual mechanisms for controlling the effective update scale $η_t \|g_t\|$, offering a principled alternative to rigid heuristics. Extensive experiments across image and language tasks demonstrate that SPAMP improves stability, convergence, and robustness over existing methods.

View on arXiv PDF

Similar