LG MLMay 30, 2025

On the Interaction of Noise, Compression Role, and Adaptivity under $(L_0, L_1)$-Smoothness: An SDE-based Approach

Enea Monzio Compagnoni, Rustem Islamov, Antonio Orvieto, Eduard Gorbunov

arXiv:2506.00181v19.42 citationsh-index: 28

Originality Incremental advance

AI Analysis

This provides theoretical insights into the interplay of noise, compression, and adaptivity in distributed optimization, which is incremental for researchers in machine learning theory.

The paper analyzes the dynamics of distributed stochastic gradient methods under (L0, L1)-smoothness and noise, showing that adaptive methods like Distributed SignSGD converge under heavy-tailed noise, while non-adaptive methods fail unless they incorporate gradient norm dependencies.

Using stochastic differential equation (SDE) approximations, we study the dynamics of Distributed SGD, Distributed Compressed SGD, and Distributed SignSGD under $(L_0,L_1)$-smoothness and flexible noise assumptions. Our analysis provides insights -- which we validate through simulation -- into the intricate interactions between batch noise, stochastic gradient compression, and adaptivity in this modern theoretical setup. For instance, we show that \textit{adaptive} methods such as Distributed SignSGD can successfully converge under standard assumptions on the learning rate scheduler, even under heavy-tailed noise. On the contrary, Distributed (Compressed) SGD with pre-scheduled decaying learning rate fails to achieve convergence, unless such a schedule also accounts for an inverse dependency on the gradient norm -- de facto falling back into an adaptive method.

View on arXiv PDF

Similar