Convergence Analysis of Randomized Subspace Normalized SGD under Heavy-Tailed Noise
This work addresses convergence guarantees for stochastic optimization methods in machine learning, but it is incremental as it builds on existing subspace and normalized SGD approaches.
The paper tackles the problem of analyzing randomized subspace methods in nonconvex optimization under heavy-tailed noise, proving that RS-NSGD achieves better oracle complexity than full-dimensional normalized SGD.
Randomized subspace methods reduce per-iteration cost; however, in nonconvex optimization, most analyses are expectation-based, and high-probability bounds remain scarce even under sub-Gaussian noise. We first prove that randomized subspace SGD (RS-SGD) admits a high-probability convergence bound under sub-Gaussian noise, achieving the same order of oracle complexity as prior in-expectation results. Motivated by the prevalence of heavy-tailed gradients in modern machine learning, we then propose randomized subspace normalized SGD (RS-NSGD), which integrates direction normalization into subspace updates. Assuming the noise has bounded $p$-th moments, we establish both in-expectation and high-probability convergence guarantees, and show that RS-NSGD can achieve better oracle complexity than full-dimensional normalized SGD.