Stochastic Optimization under Distributional Drift
This work addresses the challenge of distributional drift in machine learning and signal processing, offering theoretical insights for practitioners dealing with evolving data, though it appears incremental as it builds on existing stochastic optimization methods.
The paper tackles the problem of minimizing a convex function under unknown stochastic dynamics, such as concept drift, by providing non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, showing bounds in expectation and with high probability that decouple optimization error, gradient noise, and time drift.
We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability. The efficiency estimates we obtain clearly decouple the contributions of optimization error, gradient noise, and time drift. Notably, we identify a low drift-to-noise regime in which the tracking efficiency of the proximal stochastic gradient method benefits significantly from a step decay schedule. Numerical experiments illustrate our results.