The learning phases in NN: From Fitting the Majority to Fitting a Few
This work addresses a fundamental debate in machine learning about how neural networks learn, which could impact researchers and practitioners in AI and deep learning.
The paper tackles the controversy over learning dynamics in deep neural networks by analyzing how layers reconstruct inputs and predict outputs during training, showing that a prototyping phase initially decreases reconstruction loss before a later phase reduces classification loss for a few samples at the expense of increased reconstruction loss, under mild data assumptions.
The learning dynamics of deep neural networks are subject to controversy. Using the information bottleneck (IB) theory separate fitting and compression phases have been put forward but have since been heavily debated. We approach learning dynamics by analyzing a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training. We show that a prototyping phase decreasing reconstruction loss initially, followed by reducing classification loss of a few samples, which increases reconstruction loss, exists under mild assumptions on the data. Aside from providing a mathematical analysis of single layer classification networks, we also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.