Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training
This work addresses the curse of dimensionality in simulating training dynamics for researchers in statistical physics and machine learning, but it is incremental as it applies known methods to a specific, simplified network architecture.
The paper tackles the problem of simulating neural network training dynamics by using the Fokker-Planck equation to model the probability density evolution of weight matrices in bottleneck layers of a simple auto-encoder, comparing theoretical predictions against empirical data distributions.
The dynamical evolution of a neural network during training has been an incredibly fascinating subject of study. First principal derivation of generic evolution of variables in statistical physics systems has proved useful when used to describe training dynamics conceptually, which in practice means numerically solving equations such as Fokker-Planck equation. Simulating entire networks inevitably runs into the curse of dimensionality. In this paper, we utilize Fokker-Planck to simulate the probability density evolution of individual weight matrices in the bottleneck layers of a simple 2-bottleneck-layered auto-encoder and compare the theoretical evolutions against the empirical ones by examining the output data distributions. We also derive physically relevant partial differential equations such as Callan-Symanzik and Kardar-Parisi-Zhang equations from the dynamical equation we have.