Exploring Learned Representations of Neural Networks with Principal Component Analysis
This work addresses the problem of understanding feature representations in deep neural networks for researchers in explainable AI, but it is incremental as it applies existing methods to new data.
The study used principal component analysis to analyze learned representations in a ResNet-18 trained on CIFAR-10, finding that as little as 20% of feature-space variance in some layers suffices for high-accuracy classification and that the first ~100 principal components determine classifier performance.
Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We show that in certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification and that across all layers, the first ~100 PCs completely determine the performance of the k-NN and NCC classifiers. We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse. Our preliminary work provides three distinct yet interpretable surrogate models for feature representation with an affine linear model the best performing. We also show that leveraging several surrogate models affords us a clever method to estimate where neural collapse may initially occur within the DNN.