Efficient Analysis of the Distilled Neural Tangent Kernel
This work addresses efficiency issues for researchers and practitioners using NTK methods in machine learning, though it is incremental as it builds on existing projection and distillation techniques.
The paper tackles the computational bottleneck in neural tangent kernel (NTK) methods by compressing the data dimension using NTK-tuned dataset distillation, achieving a 20-100x reduction in Jacobian calculations and up to five orders of magnitude reduction in computational complexity while preserving predictive performance.
Neural tangent kernel (NTK) methods are computationally limited by the need to evaluate large Jacobians across many data points. Existing approaches reduce this cost primarily through projecting and sketching the Jacobian. We show that NTK computation can also be reduced by compressing the data dimension itself using NTK-tuned dataset distillation. We demonstrate that the neural tangent space spanned by the input data can be induced by dataset distillation, yielding a 20-100$\times$ reduction in required Jacobian calculations. We further show that per-class NTK matrices have low effective rank that is preserved by this reduction. Building on these insights, we propose the distilled neural tangent kernel (DNTK), which combines NTK-tuned dataset distillation with state-of-the-art projection methods to reduce up NTK computational complexity by up to five orders of magnitude while preserving kernel structure and predictive performance.