Spring-block theory of feature learning in deep neural networks
This work addresses a foundational problem in machine learning by providing a theoretical framework for feature learning in deep nets, which is incremental as it builds on existing theories.
The paper tackled the problem of understanding how feature learning emerges in deep neural networks from factors like nonlinearity and noise, and proposed a macroscopic mechanical theory that links feature learning across layers to generalization.
Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. How this emerges from the collective action of nonlinearity, noise, learning rate, and other factors, has eluded first-principles theories built from microscopic neuronal dynamics. We exhibit a noise-nonlinearity phase diagram that identifies regimes where shallow or deep layers learn more effectively and propose a macroscopic mechanical theory that reproduces the diagram and links feature learning across layers to generalization.