LGMLOct 9, 2018

Feature Selection using Stochastic Gates

arXiv:1810.04247v738 citations
Originality Incremental advance
AI Analysis

This addresses feature selection in non-linear settings, which is less studied than linear cases, but is incremental as it builds on existing ℓ0 norm and relaxation techniques.

The paper tackles feature selection for high-dimensional non-linear functions by proposing a method based on minimizing the ℓ0 norm with continuous relaxation of Bernoulli distributions, enabling gradient-based learning and feature selection simultaneously, and demonstrates its potential on synthetic and real-life applications.

Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the $\ell_0$ norm of the vector of indicator variables that represent if a feature is selected or not. Our approach relies on the continuous relaxation of Bernoulli distributions, which allows our model to learn the parameters of the approximate Bernoulli distributions via gradient descent. This general framework simultaneously minimizes a loss function while selecting relevant features. Furthermore, we provide an information-theoretic justification of incorporating Bernoulli distribution into our approach and demonstrate the potential of the approach on synthetic and real-life applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes