LGMLMar 28, 2020

Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

arXiv:2003.12895v115 citations
AI Analysis

This provides theoretical guarantees for memorization in neural networks without over-parameterization, which is important for understanding optimization and generalization in machine learning.

The paper tackles the problem of memorizing Gaussian data points with neural networks, proving that a single gradient descent step on a two-layer network with orthogonal initialization can memorize Ω(dq/log⁴(d)) independent randomly labeled Gaussians in ℝ^d.

We prove that a single step of gradient decent over depth two network, with $q$ hidden neurons, starting from orthogonal initialization, can memorize $Ω\left(\frac{dq}{\log^4(d)}\right)$ independent and randomly labeled Gaussians in $\mathbb{R}^d$. The result is valid for a large class of activation functions, which includes the absolute value.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes