On the Inductive Bias of Neural Tangent Kernels
This work provides theoretical insights into why gradient descent generalizes well in over-parameterized regimes, which is important for understanding deep learning optimization.
The paper analyzes the inductive bias of neural tangent kernels (NTKs) in over-parameterized neural networks, examining properties like smoothness, approximation, and stability, including deformation stability in convolutional networks, and compares them to other kernels.
State-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties. A recent line of work has shown that in a certain over-parameterized regime, the learning dynamics of gradient descent are governed by a certain kernel obtained at initialization, called the neural tangent kernel. We study the inductive bias of learning in such a regime by analyzing this kernel and the corresponding function space (RKHS). In particular, we study smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compare to other known kernels for similar architectures.