LGNov 27, 2022

A Kernel Perspective of Skip Connections in Convolutional Networks

arXiv:2211.14810v214 citationsh-index: 54
Originality Incremental advance
AI Analysis

This work provides theoretical insights into why ResNets train efficiently, which is important for researchers in deep learning optimization, though it is incremental as it builds on existing kernel methods.

The paper analyzed residual networks (ResNets) using Gaussian Process and Neural Tangent kernels, finding that while skip connections maintain a similar frequency bias as non-residual networks, they introduce more local bias and yield better condition numbers for faster gradient descent convergence.

Over-parameterized residual networks (ResNets) are amongst the most successful convolutional neural architectures for image processing. Here we study their properties through their Gaussian Process and Neural Tangent kernels. We derive explicit formulas for these kernels, analyze their spectra, and provide bounds on their implied condition numbers. Our results indicate that (1) with ReLU activation, the eigenvalues of these residual kernels decay polynomially at a similar rate compared to the same kernels when skip connections are not used, thus maintaining a similar frequency bias; (2) however, residual kernels are more locally biased. Our analysis further shows that the matrices obtained by these residual kernels yield favorable condition numbers at finite depths than those obtained without the skip connections, enabling therefore faster convergence of training with gradient descent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes