Feature Selection via Regularized Trees
This provides an efficient feature selection solution for practical problems using tree models, but it is incremental as it builds on existing tree regularization concepts.
The authors tackled the problem of feature selection in tree-based models by introducing a regularization framework that penalizes selecting new features with similar gain to previously used ones, resulting in high-quality feature subsets for both strong and weak classifiers as shown in experimental studies.
We propose a tree regularization framework, which enables many tree models to perform feature selection efficiently. The key idea of the regularization framework is to penalize selecting a new feature for splitting when its gain (e.g. information gain) is similar to the features used in previous splits. The regularization framework is applied on random forest and boosted trees here, and can be easily applied to other tree models. Experimental studies show that the regularized trees can select high-quality feature subsets with regard to both strong and weak classifiers. Because tree models can naturally deal with categorical and numerical variables, missing values, different scales between variables, interactions and nonlinearities etc., the tree regularization framework provides an effective and efficient feature selection solution for many practical problems.