Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features
This work addresses a specific issue in machine learning for researchers and practitioners dealing with feature selection in datasets with correlated features, representing an incremental improvement over prior adjusted measures.
The paper tackled the problem of feature selection stability measures misbehaving for datasets with highly correlated or similar features by introducing new adjusted measures that treat highly similar features as exchangeable, and demonstrated their superiority over existing measures through comparisons on artificial and real selected feature sets.
For data sets with similar features, for example highly correlated features, most existing stability measures behave in an undesired way: They consider features that are almost identical but have different identifiers as different features. Existing adjusted stability measures, that is, stability measures that take into account the similarities between features, have major theoretical drawbacks. We introduce new adjusted stability measures that overcome these drawbacks. We compare them to each other and to existing stability measures based on both artificial and real sets of selected features. Based on the results, we suggest using one new stability measure that considers highly similar features as exchangeable.