Binary Stochastic Filtering: feature selection and beyond
This work addresses feature selection for machine learning practitioners by providing a novel, efficient method that can be applied to any existing architecture, though it is incremental in extending sparsity regularization approaches.
The paper tackles the problem of feature selection in neural networks by introducing a method that stochastically penalizes feature involvement rather than layer weights, enabling automatic feature selection with minimal computational overhead and demonstrating superior efficiency compared to classical methods.
Feature selection is one of the most decisive tools in understanding data and machine learning models. Among other methods, sparsity induced by $L^{1}$ penalty is one of the simplest and best studied approaches to this problem. Although such regularization is frequently used in neural networks to achieve sparsity of weights or unit activations, it is unclear how it can be employed in the feature selection problem. This work aims at extending the neural network with ability to automatically select features by rethinking how the sparsity regularization can be used, namely, by stochastically penalizing feature involvement instead of the layer weights. The proposed method has demonstrated superior efficiency when compared to a few classical methods, achieved with minimal or no computational overhead, and can be directly applied to any existing architecture. Furthermore, the method is easily generalizable for neuron pruning and selection of regions of importance for spectral data.