LGMLDec 28, 2020

Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization

arXiv:2012.14193v381 citations
Originality Highly original
AI Analysis

This work is significant for deep learning practitioners and researchers, as it provides insights into why learning rate choice affects generalization and proposes a new regularization method to improve it.

The paper investigates the impact of the early training phase on the local curvature of the loss function, specifically the Fisher Information Matrix (FIM). It finds that SGD implicitly penalizes the trace of the FIM, and explicitly penalizing it can improve generalization by limiting memorization of noisy labels.

The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. We ask whether this tendency is connected to the widely observed phenomenon that the choice of the learning rate strongly influences generalization. We first show that stochastic gradient descent (SGD) implicitly penalizes the trace of the Fisher Information Matrix (FIM), a measure of the local curvature, from the start of training. We argue it is an implicit regularizer in SGD by showing that explicitly penalizing the trace of the FIM can significantly improve generalization. We highlight that poor final generalization coincides with the trace of the FIM attaining a large value early in training, to which we refer as catastrophic Fisher explosion. Finally, to gain insight into the regularization effect of penalizing the trace of the FIM, we show that it limits memorization by reducing the learning speed of examples with noisy labels more than that of the examples with clean labels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes