LG MLMar 28, 2020

Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

arXiv:2003.12895v19.615 citations

Originality Highly original

AI Analysis

This provides theoretical guarantees for memorization in neural networks without over-parameterization, which is important for understanding optimization and generalization in machine learning.

The paper tackles the problem of memorizing Gaussian data points with neural networks, proving that a single gradient descent step on a two-layer network with orthogonal initialization can memorize Ω(dq/log⁴(d)) independent randomly labeled Gaussians in ℝ^d.

We prove that a single step of gradient decent over depth two network, with $q$ hidden neurons, starting from orthogonal initialization, can memorize $Ω\left(\frac{dq}{\log^4(d)}\right)$ independent and randomly labeled Gaussians in $\mathbb{R}^d$. The result is valid for a large class of activation functions, which includes the absolute value.

View on arXiv PDF

Similar