Infinite-width limit of deep linear neural networks
This provides foundational theoretical insights for understanding neural network behavior, though it is incremental as it focuses on linear networks.
The paper tackles the theoretical analysis of deep linear neural networks in the infinite-width limit, showing that training dynamics converge to deterministic gradient descent and that linear predictors converge exponentially to the minimal ℓ₂-norm minimizer of the risk.
This paper studies the infinite-width limit of deep linear neural networks initialized with random parameters. We obtain that, when the number of neurons diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear neural network. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of neurons. We finally study the continuous-time limit obtained for infinitely wide linear neural networks and show that the linear predictors of the neural network converge at an exponential rate to the minimal $\ell_2$-norm minimizer of the risk.