Rethinking Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics
This work addresses the challenge of using margins to predict generalization in neural networks, particularly for researchers in machine learning theory, but it is incremental as it builds on existing margin theory and phase transition concepts.
The paper revisits Breiman's dilemma in deep neural networks by analyzing phase transitions of normalized margin distributions during training, showing that when model expressiveness matches data complexity, high margin dynamics can predict generalization error trends, but over-expressive models exhibit uniform margin improvements that fail to predict overfitting.
Margin enlargement over training data has been an important strategy since perceptrons in machine learning for the purpose of boosting the robustness of classifiers toward a good generalization ability. Yet Breiman (1999) showed a dilemma that a uniform improvement on margin distribution does NOT necessarily reduces generalization errors. In this paper, we revisit Breiman's dilemma in deep neural networks with recently proposed spectrally normalized margins, from a novel perspective based on phase transitions of normalized margin distributions in training dynamics. Normalized margin distribution of a classifier over the data, can be divided into two parts: low/small margins such as some negative margins for misclassified samples vs. high/large margins for high confident correctly classified samples, that often behave differently during the training process. Low margins for training and test datasets are often effectively reduced in training, along with reductions of training and test errors; while high margins may exhibit different dynamics, reflecting the trade-off between expressive power of models and complexity of data. When data complexity is comparable to the model expressiveness, high margin distributions for both training and test data undergo similar decrease-increase phase transitions during training. In such cases, one can predict the trend of generalization or test error by margin-based generalization bounds with restricted Rademacher complexities, shown in two ways in this paper with early stopping time exploiting such phase transitions. On the other hand, over-expressive models may have both low and high training margins undergoing uniform improvements, with a distinct phase transition in test margin dynamics. This reconfirms the Breiman's dilemma associated with overparameterized neural networks where margins fail to predict overfitting.