The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
This work provides incremental theoretical insights into neural network dynamics, addressing a specific bottleneck in understanding learning rates for different function frequencies.
The paper tackles the problem of how the frequency of a function affects neural network learning speed, showing that shallow networks without bias cannot learn low-frequency odd functions and deriving predictions for learning times that match empirical results.
We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results. However, we show theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies. Our results lead to specific predictions of the time it will take a network to learn functions of varying frequency. These predictions match the empirical behavior of both shallow and deep networks.