Embedded methods for feature selection in neural networks
This addresses the issue of high-dimensional and noisy features in neural networks for researchers and practitioners, but it is incremental as it builds on existing feature selection techniques.
The paper tackles the problem of feature selection in neural networks to improve interpretability, generalizability, and training time by proposing two integrated methods that directly incorporate feature selection into parameter learning, and these methods consistently outperform baselines like Permutation Feature Importance on datasets such as MNIST, ISOLET, and HAR.
The representational capacity of modern neural network architectures has made them a default choice in various applications with high dimensional feature sets. But these high dimensional and potentially noisy features combined with the black box models like neural networks negatively affect the interpretability, generalizability, and the training time of these models. Here, I propose two integrated approaches for feature selection that can be incorporated directly into the parameter learning. One of them involves adding a drop-in layer and performing sequential weight pruning. The other is a sensitivity-based approach. I benchmarked both the methods against Permutation Feature Importance (PFI) - a general-purpose feature ranking method and a random baseline. The suggested approaches turn out to be viable methods for feature selection, consistently outperform the baselines on the tested datasets - MNIST, ISOLET, and HAR. We can add them to any existing model with only a few lines of code.