LGApr 7, 2021

Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks

arXiv:2104.03093v124 citations
AI Analysis

This work provides incremental theoretical insights into the behavior of deep residual networks, which is relevant for researchers in machine learning theory.

The authors tackled the theoretical understanding of deep residual networks by analyzing their neural tangent kernels (ResNTK), showing that for uniformly distributed inputs on a hypersphere, the eigenfunctions are spherical harmonics and eigenvalues decay polynomially as k^{-d}, similar to fully connected networks and the Laplace kernel, and that ResNTK can become spiky or stable depending on a hyper-parameter balancing skip and residual connections.

Deep residual network architectures have been shown to achieve superior accuracy over classical feed-forward networks, yet their success is still not fully understood. Focusing on massively over-parameterized, fully connected residual networks with ReLU activation through their respective neural tangent kernels (ResNTK), we provide here a spectral analysis of these kernels. Specifically, we show that, much like NTK for fully connected networks (FC-NTK), for input distributed uniformly on the hypersphere $\mathbb{S}^{d-1}$, the eigenfunctions of ResNTK are the spherical harmonics and the eigenvalues decay polynomially with frequency $k$ as $k^{-d}$. These in turn imply that the set of functions in their Reproducing Kernel Hilbert Space are identical to those of FC-NTK, and consequently also to those of the Laplace kernel. We further show, by drawing on the analogy to the Laplace kernel, that depending on the choice of a hyper-parameter that balances between the skip and residual connections ResNTK can either become spiky with depth, as with FC-NTK, or maintain a stable shape.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes