LGCVDSJun 15, 2021

Scaling Neural Tangent Kernels via Sketching and Random Features

arXiv:2106.07880v238 citations
Originality Incremental advance
AI Analysis

This work addresses scalability issues for researchers and practitioners using NTK-based methods in machine learning, though it is incremental as it builds on existing NTK and approximation techniques.

The authors tackled the computational bottleneck of Neural Tangent Kernel (NTK) methods for large-scale learning by developing a fast approximation algorithm using sketching and random features, achieving a 150x speedup on CIFAR-10 while matching exact CNTK accuracy.

The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. However, the computational complexity of kernel methods has limited its use in large-scale learning tasks. To accelerate learning with NTK, we design a near input-sparsity time approximation algorithm for NTK, by sketching the polynomial expansions of arc-cosine kernels: our sketch for the convolutional counterpart of NTK (CNTK) can transform any image using a linear runtime in the number of pixels. Furthermore, we prove a spectral approximation guarantee for the NTK matrix, by combining random features (based on leverage score sampling) of the arc-cosine kernels with a sketching algorithm. We benchmark our methods on various large-scale regression and classification tasks and show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes