Generalized Leverage Score Sampling for Neural Networks
This work addresses the computational efficiency of neural network training for researchers and practitioners, but it appears incremental as it extends existing leverage score sampling results to new contexts.
The paper tackles the problem of accelerating neural network training by generalizing leverage score sampling from kernel methods to a broader class of kernels and applying it to deep learning theory, showing connections between initialization and neural tangent kernel approximation and proving equivalence between regularized neural networks and kernel ridge regression under different initializations.
Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e.g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow. Recently, it has been shown that leverage score sampling helps to accelerate kernel methods [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17]. In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels. We further bring the leverage score sampling into the field of deep learning theory. $\bullet$ We show the connection between the initialization for neural network training and approximating the neural tangent kernel with random features. $\bullet$ We prove the equivalence between regularized neural network and neural tangent kernel ridge regression under the initialization of both classical random Gaussian and leverage score sampling.