LGNEMLNov 30, 2020

Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

arXiv:2012.00152v181 citations
AI Analysis

This work provides a new interpretation of deep learning models for researchers, suggesting that their success might stem from kernel-like behavior rather than representation learning.

This paper demonstrates that deep neural networks trained with gradient descent are approximately equivalent to kernel machines. This means deep networks effectively memorize training data and use it for prediction via a similarity function, rather than discovering new representations.

Deep learning's successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. This improved understanding should lead to better learning algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes