Accelerating Single-Pass SGD for Generalized Linear Prediction

arXiv:2603.01951v11.4h-index: 1

Originality Highly original

AI Analysis

This provides an incremental improvement for machine learning practitioners working with streaming data and generalized linear models.

The paper tackles the problem of accelerating single-pass stochastic gradient descent for generalized linear prediction in streaming settings, showing that momentum acceleration via a novel data-dependent proximal method resolves an open problem and outperforms variance reduction approaches.

We study generalized linear prediction under a streaming setting, where each iteration uses only one fresh data point for a gradient-level update. While momentum is well-established in deterministic optimization, a fundamental open question is whether it can accelerate such single-pass non-quadratic stochastic optimization. We propose the first algorithm that successfully incorporates momentum via a novel data-dependent proximal method, achieving dual-momentum acceleration. Our derived excess risk bound decomposes into three components: an improved optimization error, a minimax optimal statistical error, and a higher-order model-misspecification error. The proof handles mis-specification via a fine-grained stationary analysis of inner updates, while localizing statistical error through a two-phase outer-loop analysis. As a result, we resolve the open problem posed by Jain et al. [2018a] and demonstrate that momentum acceleration is more effective than variance reduction for generalized linear prediction in the streaming setting.

View on arXiv PDF

Similar