ML LG OCJun 6, 2024

Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression

Jackie Lok, Rishi Sonthalia, Elizaveta Rebrova

arXiv:2406.03696v23.1

Originality Incremental advance

AI Analysis

This provides theoretical insights into optimization algorithms for machine learning practitioners, but it is incremental as it builds on existing gradient descent analysis.

The paper tackles the problem of understanding the error dynamics of mini-batch gradient descent with random reshuffling in least squares regression, showing that training and generalization errors depend on a sample cross-covariance matrix and that mini-batch dynamics agree with full-batch up to leading order but exhibit step-size-dependent convergence not detectable by gradient flow analysis.

We study the discrete dynamics of mini-batch gradient descent with random reshuffling for least squares regression. We show that the training and generalization errors depend on a sample cross-covariance matrix $Z$ between the original features $X$ and a set of new features $\widetilde{X}$ in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. However, mini-batch gradient descent with random reshuffling exhibits a subtle dependence on the step size that a gradient flow analysis cannot detect, such as converging to a limit that depends on the step size. By comparing $Z$, a non-commutative polynomial of random matrices, with the sample covariance matrix of $X$ asymptotically, we demonstrate that batching affects the dynamics by resulting in a form of shrinkage on the spectrum.

View on arXiv PDF

Similar