MLLGDATA-ANDec 31, 2021

Separation of Scales and a Thermodynamic Description of Feature Learning in Some CNNs

arXiv:2112.15383v369 citations
Originality Incremental advance
AI Analysis

This provides a foundational thermodynamic description for understanding feature learning in DNNs, which is incremental as it builds on existing infinite-width theories.

The paper tackles the challenge of analyzing deep neural networks (DNNs) by identifying a separation of scales in trained convolutional and fully connected networks, showing that layers couple through second moments of activations, leading to a thermodynamic theory that yields accurate predictions in various settings.

Deep neural networks (DNNs) are powerful tools for compressing and distilling information. Their scale and complexity, often involving billions of inter-dependent parameters, render direct microscopic analysis difficult. Under such circumstances, a common strategy is to identify slow variables that average the erratic behavior of the fast microscopic variables. Here, we identify a similar separation of scales occurring in fully trained finitely over-parameterized deep convolutional neural networks (CNNs) and fully connected networks (FCNs). Specifically, we show that DNN layers couple only through the second moment (kernels) of their activations and pre-activations. Moreover, the latter fluctuates in a nearly Gaussian manner. For infinite width DNNs, these kernels are inert, while for finite ones they adapt to the data and yield a tractable data-aware Gaussian Process. The resulting thermodynamic theory of deep learning yields accurate predictions in various settings. In addition, it provides new ways of analyzing and understanding DNNs in general.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes