An entropy formula for the Deep Linear Network
This work provides a foundational thermodynamic description for DLNs, which is incremental as it builds on existing geometric methods to address overparametrization in neural networks.
The paper tackles the problem of understanding the geometry of Deep Linear Networks (DLNs) by developing a Riemannian framework to analyze overparametrization and define a Boltzmann entropy for the learning process, resulting in an explicit construction of an orthonormal basis for the tangent space of the balanced manifold using Jacobi matrices.
We study the Riemannian geometry of the Deep Linear Network (DLN) as a foundation for a thermodynamic description of the learning process. The main tools are the use of group actions to analyze overparametrization and the use of Riemannian submersion from the space of parameters to the space of observables. The foliation of the balanced manifold in the parameter space by group orbits is used to define and compute a Boltzmann entropy. We also show that the Riemannian geometry on the space of observables defined in [2] is obtained by Riemannian submersion of the balanced manifold. The main technical step is an explicit construction of an orthonormal basis for the tangent space of the balanced manifold using the theory of Jacobi matrices.