MLLGOct 2, 2023

Commutative Width and Depth Scaling in Deep Neural Networks

arXiv:2310.01683v12 citationsh-index: 15
Originality Incremental advance
AI Analysis

This provides theoretical insights into neural network scaling for researchers, though it is incremental as it extends prior work.

The paper investigates whether the order of taking infinite width and depth limits in deep neural networks with skip connections affects the resulting neural covariance kernel, finding that with suitable scaling, the limits commute and yield the same covariance structure regardless of order.

This paper is the second in the series Commutative Scaling of Width and Depth (WD) about commutativity of infinite width and depth limits in deep neural networks. Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i.e. the neural function tends to the same limit no matter how width and depth limits are taken. In this paper, we formally introduce and define the commutativity framework, and discuss its implications on neural network design and scaling. We study commutativity for the neural covariance kernel which reflects how network layers separate data. Our findings extend previous results established in [55] by showing that taking the width and depth to infinity in a deep neural network with skip connections, when branches are suitably scaled to avoid exploding behaviour, result in the same covariance structure no matter how that limit is taken. This has a number of theoretical and practical implications that we discuss in the paper. The proof techniques in this paper are novel and rely on tools that are more accessible to readers who are not familiar with stochastic calculus (used in the proofs of WD(I))).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes