Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra
This provides insights into deep learning optimization and generalization, with potential applications in improving training algorithms, but it is incremental as it builds on existing spectral analysis work.
The paper identifies a class/cross-class structure in deep learning spectra, showing it explains spectral features like outliers and bumps, and proves the ratio of outliers to bulk in the Fisher information matrix predicts misclassification in multinomial logistic regression.
Numerous researchers recently applied empirical spectral analysis to the study of modern deep learning classifiers. We identify and discuss an important formal class/cross-class structure and show how it lies at the origin of the many visually striking features observed in deepnet spectra, some of which were reported in recent articles, others are unveiled here for the first time. These include spectral outliers, "spikes", and small but distinct continuous distributions, "bumps", often seen beyond the edge of a "main bulk". The significance of the cross-class structure is illustrated in three ways: (i) we prove the ratio of outliers to bulk in the spectrum of the Fisher information matrix is predictive of misclassification, in the context of multinomial logistic regression; (ii) we demonstrate how, gradually with depth, a network is able to separate class-distinctive information from class variability, all while orthogonalizing the class-distinctive information; and (iii) we propose a correction to KFAC, a well-known second-order optimization algorithm for training deepnets.