High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso
This addresses feature selection for high-dimensional datasets, offering a scalable method for non-linear dependencies, but it is incremental as it builds on the Lasso framework.
The paper tackles the problem of supervised feature selection in high-dimensional data by proposing a feature-wise kernelized Lasso to capture non-linear input-output dependencies, demonstrating its effectiveness through experiments with thousands of features.
The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a feature-wise kernelized Lasso for capturing non-linear input-output dependency. We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments with thousands of features.