STAT-MECHCLOct 10, 2023

Jaynes Machine: The universal microstructure of deep neural networks

arXiv:2310.06960v13 citationsh-index: 18
Originality Highly original
AI Analysis

This foundational theory could reduce data, time, and computational resources for training deep neural networks, impacting all of ML/AI.

The paper tackles the problem of understanding the microstructure of deep neural networks by predicting a universal lognormal distribution of connection strengths across layers, supported by empirical data from six large-scale networks.

We present a novel theory of the microstructure of deep neural networks. Using a theoretical framework called statistical teleodynamics, which is a conceptual synthesis of statistical thermodynamics and potential game theory, we predict that all highly connected layers of deep neural networks have a universal microstructure of connection strengths that is distributed lognormally ($LN(μ, σ)$). Furthermore, under ideal conditions, the theory predicts that $μ$ and $σ$ are the same for all layers in all networks. This is shown to be the result of an arbitrage equilibrium where all connections compete and contribute the same effective utility towards the minimization of the overall loss function. These surprising predictions are shown to be supported by empirical data from six large-scale deep neural networks in real life. We also discuss how these results can be exploited to reduce the amount of data, time, and computational resources needed to train large deep neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes