On the Limitation of Kernel Dependence Maximization for Feature Selection
This is an incremental analysis highlighting limitations for researchers using kernel-based feature selection methods.
The paper demonstrates that feature selection via HSIC maximization can fail to identify critical features, showing through counterexamples that the method's rationale is flawed.
A simple and intuitive method for feature selection consists of choosing the feature subset that maximizes a nonparametric measure of dependence between the response and the features. A popular proposal from the literature uses the Hilbert-Schmidt Independence Criterion (HSIC) as the nonparametric dependence measure. The rationale behind this approach to feature selection is that important features will exhibit a high dependence with the response and their inclusion in the set of selected features will increase the HSIC. Through counterexamples, we demonstrate that this rationale is flawed and that feature selection via HSIC maximization can miss critical features.