LGMLJan 28, 2020

On Random Kernels of Residual Architectures

arXiv:2001.10460v47 citations
AI Analysis

This work provides theoretical insights for researchers in deep learning theory, addressing the behavior of popular architectures in kernel methods, but it is incremental as it builds on existing NTK analysis.

The paper tackles the problem of understanding how residual architectures like ResNets and DenseNets behave in terms of their Neural Tangent Kernels (NTKs) under finite width and depth conditions, finding that these architectures converge to the kernel regime more easily than vanilla networks, with ResNets requiring simultaneous increases in depth and width and DenseNets achieving convergence independently of depth and weight scale.

We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets. Our analysis reveals that finite size residual architectures are initialized much closer to the "kernel regime" than their vanilla counterparts: while in networks that do not use skip connections, convergence to the NTK requires one to fix the depth, while increasing the layers' width. Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity, provided with a proper initialization. In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed, at a rate that is independent of both the depth and scale of the weights. Our experiments validate the theoretical results and demonstrate the advantage of deep ResNets and DenseNets for kernel regression with random gradient features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes