LGDSOCOct 7, 2021

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

arXiv:2110.03677v254 citations
Originality Incremental advance
AI Analysis

This provides incremental theoretical insights into the benefits of large learning rates for researchers in optimization and deep learning.

The paper tackles the theoretical justification for using large learning rates in deep learning by analyzing gradient descent on a homogeneous matrix factorization problem, proving convergence beyond the typical bound and establishing an implicit balancing effect between factor magnitudes.

Recent empirical advances show that training deep models with large learning rate often improves generalization performance. However, theoretical justifications on the benefits of large learning rate are highly limited, due to challenges in analysis. In this paper, we consider using Gradient Descent (GD) with a large learning rate on a homogeneous matrix factorization problem, i.e., $\min_{X, Y} \|A - XY^\top\|_{\sf F}^2$. We prove a convergence theory for constant large learning rates well beyond $2/L$, where $L$ is the largest eigenvalue of Hessian at the initialization. Moreover, we rigorously establish an implicit bias of GD induced by such a large learning rate, termed 'balancing', meaning that magnitudes of $X$ and $Y$ at the limit of GD iterations will be close even if their initialization is significantly unbalanced. Numerical experiments are provided to support our theory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes