On Correlation of Features Extracted by Deep Neural Networks
This work addresses the problem of feature redundancy in DNNs for researchers and practitioners, but it is incremental as it builds on known properties without introducing new methods.
The study investigated how network size, activation functions, and weight initialization affect the extraction of redundant features in deep neural networks, finding that size and activation functions are the most influential factors in promoting redundancy.
Redundancy in deep neural network (DNN) models has always been one of their most intriguing and important properties. DNNs have been shown to overparameterize, or extract a lot of redundant features. In this work, we explore the impact of size (both width and depth), activation function, and weight initialization on the susceptibility of deep neural network models to extract redundant features. To estimate the number of redundant features in each layer, all the features of a given layer are hierarchically clustered according to their relative cosine distances in feature space and a set threshold. It is shown that both network size and activation function are the two most important components that foster the tendency of DNNs to extract redundant features. The concept is illustrated using deep multilayer perceptron and convolutional neural networks on MNIST digits recognition and CIFAR-10 dataset, respectively.