Tight conditions for when the NTK approximation is valid
This provides a theoretical foundation for understanding NTK behavior in deep learning, though it is incremental as it refines existing bounds.
The paper establishes tight conditions for when the neural tangent kernel (NTK) approximation holds in lazy training with square loss, showing that a rescaling factor of α = O(T) suffices, improving the previous bound of α = O(T²).
We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss. In the lazy training setting of Chizat et al. 2019, we show that rescaling the model by a factor of $α= O(T)$ suffices for the NTK approximation to be valid until training time $T$. Our bound is tight and improves on the previous bound of Chizat et al. 2019, which required a larger rescaling factor of $α= O(T^2)$.