Feature Selection via L1-Penalized Squared-Loss Mutual Information
This work addresses feature selection for machine learning practitioners by incorporating feature interaction, but it appears incremental as it builds on existing mutual information methods with a new variant.
The paper tackled feature selection by addressing the overlooked issue of feature interaction, proposing L1-LSMI, an L1-regularization algorithm that maximizes squared-loss mutual information, and numerical results showed it performs well in handling redundancy, detecting non-linear dependency, and considering feature interaction.
Feature selection is a technique to screen out less important features. Many existing supervised feature selection algorithms use redundancy and relevancy as the main criteria to select features. However, feature interaction, potentially a key characteristic in real-world problems, has not received much attention. As an attempt to take feature interaction into account, we propose L1-LSMI, an L1-regularization based algorithm that maximizes a squared-loss variant of mutual information between selected features and outputs. Numerical results show that L1-LSMI performs well in handling redundancy, detecting non-linear dependency, and considering feature interaction.