Extended critical regimes of deep neural networks

arXiv:2203.12967v12 citationsh-index: 20
Originality Highly original
AI Analysis

This work addresses a foundational problem in machine learning by providing theoretical insights into DNN dynamics, potentially guiding more efficient neural architecture design.

The authors tackled the problem of understanding deep neural networks' dynamics by developing a mean field theory that predicts heavy-tailed weights enable an extended critical regime, leading to computational advantages like balanced representation propagation and faster training.

Deep neural networks (DNNs) have been successfully applied to many real-world problems, but a complete understanding of their dynamical and computational principles is still lacking. Conventional theoretical frameworks for analysing DNNs often assume random networks with coupling weights obeying Gaussian statistics. However, non-Gaussian, heavy-tailed coupling is a ubiquitous phenomenon in DNNs. Here, by weaving together theories of heavy-tailed random matrices and non-equilibrium statistical physics, we develop a new type of mean field theory for DNNs which predicts that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We further elucidate that the extended criticality endows DNNs with profound computational advantages: balancing the contraction as well as expansion of internal neural representations and speeding up training processes, hence providing a theoretical guide for the design of efficient neural architectures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes