LGMar 9, 2023

Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalisation

Simone Ciceri, Lorenzo Cassani, Matteo Osella, Pietro Rotondo, Filippo Valle, Marco Gherardi

arXiv:2303.05161v28.810 citationsh-index: 20

Originality Incremental advance

AI Analysis

This work addresses the fundamental trade-off between class separation and generalization in deep learning, providing insights into optimization dynamics that are incremental but broadly applicable across models.

The study investigates how deep learning models balance class separation and feature entanglement during training to avoid overfitting, finding a consistent non-monotonic trend where initial segregation is followed by increased entanglement, with the inversion point stable across datasets and architectures.

To achieve near-zero training error in a classification problem, the layers of a feed-forward network have to disentangle the manifolds of data points with different labels, to facilitate the discrimination. However, excessive class separation can bring to overfitting since good generalisation requires learning invariant features, which involve some level of entanglement. We report on numerical experiments showing how the optimisation dynamics finds representations that balance these opposing tendencies with a non-monotonic trend. After a fast segregation phase, a slower rearrangement (conserved across data sets and architectures) increases the class entanglement.The training error at the inversion is stable under subsampling, and across network initialisations and optimisers, which characterises it as a property solely of the data structure and (very weakly) of the architecture. The inversion is the manifestation of tradeoffs elicited by well-defined and maximally stable elements of the training set, coined ``stragglers'', particularly influential for generalisation.

View on arXiv PDF

Similar