Flat minima generalize for low-rank matrix recovery
This work provides theoretical insights into why flat minima generalize well, addressing a key problem in machine learning for researchers and practitioners dealing with overparameterized models, though it is incremental as it builds on existing empirical observations.
The paper investigates the generalization benefits of flat minima in overparameterized nonlinear models, focusing on low-rank matrix recovery tasks such as matrix sensing and robust PCA, and proves that flat minima exactly recover the ground truth under standard assumptions, with empirical support for matrix completion.
Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. For matrix completion, we establish weak recovery, although empirical evidence suggests exact recovery holds here as well. We conclude with synthetic experiments that illustrate our findings and discuss the effect of depth on flat solutions.