LGAIMLJul 21, 2021

Distribution of Classification Margins: Are All Data Equal?

arXiv:2107.10199v14 citations
Originality Incremental advance
AI Analysis

This work addresses generalization issues in deep learning for researchers, but it is incremental as it builds on existing margin theory.

The paper tackles the problem of understanding generalization in deep neural networks by proposing the area under the curve of the margin distribution as a measure, and shows empirically that training sets can be reduced by over 99% after data separation without significant performance loss.

Recent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints. This property of the solution however does not fully characterize the generalization performance. We motivate theoretically and show empirically that the area under the curve of the margin distribution on the training set is in fact a good measure of generalization. We then show that, after data separation is achieved, it is possible to dynamically reduce the training set by more than 99% without significant loss of performance. Interestingly, the resulting subset of "high capacity" features is not consistent across different training runs, which is consistent with the theoretical claim that all training points should converge to the same asymptotic margin under SGD and in the presence of both batch normalization and weight decay.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes