LGOCMLMar 27, 2023

Learning Rate Schedules in the Presence of Distribution Shift

arXiv:2303.15634v211 citationsh-index: 62
Originality Incremental advance
AI Analysis

This work addresses the challenge of adapting learning rates for online learning in dynamic environments, which is incremental but provides theoretical guarantees for practical applications like regression and neural networks.

The paper tackles the problem of designing learning rate schedules for SGD-based online learning under distribution shift, characterizing optimal schedules for linear regression and proposing robust schedules for convex and non-convex losses with regret bounds, showing that optimal schedules typically increase to handle changing data.

We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes