LGNEMLMay 15, 2020

Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems

arXiv:2005.07360v124 citations
AI Analysis

This addresses the problem of understanding generalization in machine learning for researchers, though it is incremental as it extends prior non-convex results to a convex setting.

The authors demonstrated that learning rate annealing can improve generalization even in convex problems, specifically showing in a 2D linear regression toy example that it leads to minima with provably better generalization than using a constant small learning rate.

Learning rate schedule can significantly affect generalization performance in modern neural networks, but the reasons for this are not yet understood. Li-Wei-Ma (2019) recently proved this behavior can exist in a simplified non-convex neural-network setting. In this note, we show that this phenomenon can exist even for convex learning problems -- in particular, linear regression in 2 dimensions. We give a toy convex problem where learning rate annealing (large initial learning rate, followed by small learning rate) can lead gradient descent to minima with provably better generalization than using a small learning rate throughout. In our case, this occurs due to a combination of the mismatch between the test and train loss landscapes, and early-stopping.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes