Training behavior of deep neural network in frequency domain
It addresses a foundational problem in machine learning by providing insights into DNN optimization and generalization, though it is incremental as it builds on existing empirical studies.
The paper tackles the mystery of why deep neural networks generalize well despite overfitting by identifying the Frequency Principle, where DNNs first capture low-frequency components and then high-frequency ones during training. This phenomenon is observed across various DNN structures and helps explain early-stopping and generalization.
Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [#zhang2016understanding]. To find a potential mechanism, we focus on the study of implicit biases underlying the training process of DNNs. In this work, for both real and synthetic datasets, we empirically find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). The F-Principle can be observed over DNNs of various structures, activation functions, and training algorithms in our experiments. We also illustrate how the F-Principle help understand the effect of early-stopping as well as the generalization of DNNs. This F-Principle potentially provides insights into a general principle underlying DNN optimization and generalization.