LGDIS-NNSTAT-MECHNov 23, 2023

Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation

arXiv:2311.14120v42 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This work offers theoretical insights into training dynamics for researchers in machine learning theory, but it is incremental as it builds on known observations in a simplified linear model.

The authors investigated weight fluctuations in underparameterized linear neural networks during stationary training, finding that in single-layer networks, weight fluctuations are anisotropic but experience an isotropic loss, while in two-layer networks, inter-layer coupling introduces anisotropy, leading to an inverse relationship between fluctuation variance and loss flatness. They provided an analytical derivation of this inverse variance-flatness relation in deep linear networks.

We investigate the stationary (late-time) training regime of single- and two-layer underparameterized linear neural networks within the continuum limit of stochastic gradient descent (SGD) for synthetic Gaussian data. In the case of a single-layer network in the weakly underparameterized regime, the spectrum of the noise covariance matrix deviates notably from the Hessian, which can be attributed to the broken detailed balance of SGD dynamics. The weight fluctuations are in this case generally anisotropic, but effectively experience an isotropic loss. For an underparameterized two-layer network, we describe the stochastic dynamics of the weights in each layer and analyze the associated stationary covariances. We identify the inter-layer coupling as a distinct source of anisotropy for the weight fluctuations. In contrast to the single-layer case, the weight fluctuations are effectively subject to an anisotropic loss, the flatness of which is inversely related to the fluctuation variance. We thereby provide an analytical derivation of the recently observed inverse variance-flatness relation in a model of a deep linear neural network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes