LGFeb 7, 2022

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

arXiv:2202.03535v11 citations
Originality Incremental advance
AI Analysis

This provides theoretical justification for noise regularization in over-parameterized learning, relevant for understanding neural network training, though it is incremental as it builds on existing concepts.

The paper tackles the problem of recovering a rank one matrix from noisy observations using over-parameterized models, showing that randomly perturbed gradient descent achieves a mean square error of O(σ²/d), compared to O(σ²) without perturbation.

We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an over-parameterization model. We parameterize the rank one matrix $Y^*$ by $XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(σ^2/d)$, where $σ^2$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(σ^2)$. Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes