LGAIITSTMLJul 3, 2018

Training behavior of deep neural network in frequency domain

arXiv:1807.01251v6399 citations
Originality Incremental advance
AI Analysis

It addresses a foundational problem in machine learning by providing insights into DNN optimization and generalization, though it is incremental as it builds on existing empirical studies.

The paper tackles the mystery of why deep neural networks generalize well despite overfitting by identifying the Frequency Principle, where DNNs first capture low-frequency components and then high-frequency ones during training. This phenomenon is observed across various DNN structures and helps explain early-stopping and generalization.

Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [#zhang2016understanding]. To find a potential mechanism, we focus on the study of implicit biases underlying the training process of DNNs. In this work, for both real and synthetic datasets, we empirically find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). The F-Principle can be observed over DNNs of various structures, activation functions, and training algorithms in our experiments. We also illustrate how the F-Principle help understand the effect of early-stopping as well as the generalization of DNNs. This F-Principle potentially provides insights into a general principle underlying DNN optimization and generalization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes