The Positivity of the Neural Tangent Kernel
This provides a sharp theoretical result for understanding training dynamics in deep learning, though it is incremental as it builds on prior work on the NTK.
The paper tackled the problem of determining when the Neural Tangent Kernel (NTK) is strictly positive definite, showing that for any non-polynomial activation function, the NTK is strictly positive definite, which relates to the memorization capacity of wide neural networks.
The Neural Tangent Kernel (NTK) has emerged as a fundamental concept in the study of wide Neural Networks. In particular, it is known that the positivity of the NTK is directly related to the memorization capacity of sufficiently wide networks, i.e., to the possibility of reaching zero loss in training, via gradient descent. Here we will improve on previous works and obtain a sharp result concerning the positivity of the NTK of feedforward networks of any depth. More precisely, we will show that, for any non-polynomial activation function, the NTK is strictly positive definite. Our results are based on a novel characterization of polynomial functions which is of independent interest.