LG DATA-ANJan 30, 2021

Linear Frequency Principle Model to Understand the Absence of Overfitting in Neural Networks

Yaoyu Zhang, Tao Luo, Zheng Ma, Zhi-Qin John Xu

arXiv:2102.00200v123 citations

Originality Highly original

AI Analysis

This addresses a foundational open question in machine learning for researchers, providing a theoretical explanation for non-overfitting behavior.

The paper tackles the puzzle of why heavily parameterized neural networks avoid overfitting by proposing a linear frequency principle model, showing that low frequency dominance in target functions is key, with experimental verification.

Why heavily parameterized neural networks (NNs) do not overfit the data is an important long standing open question. We propose a phenomenological model of the NN training to explain this non-overfitting puzzle. Our linear frequency principle (LFP) model accounts for a key dynamical feature of NNs: they learn low frequencies first, irrespective of microscopic details. Theory based on our LFP model shows that low frequency dominance of target functions is the key condition for the non-overfitting of NNs and is verified by experiments. Furthermore, through an ideal two-layer NN, we unravel how detailed microscopic NN training dynamics statistically gives rise to a LFP model with quantitative prediction power.

View on arXiv PDF

Similar