Gradient-based Laplacian Feature Selection
This addresses the problem of identifying relevant features without labels for researchers in fields like object recognition and computational biology, representing an incremental improvement over existing unsupervised methods.
The paper tackles unsupervised feature selection in high-dimensional noisy data by proposing Gradient-based Laplacian Feature Selection (GLFS), which minimizes variance in a Laplacian regularized model to select sparse, relevant features, achieving superior performance over state-of-the-art methods on simulated and real-world datasets.
Analysis of high dimensional noisy data is of essence across a variety of research fields. Feature selection techniques are designed to find the relevant feature subset that can facilitate classification or pattern detection. Traditional (supervised) feature selection methods utilize label information to guide the identification of relevant feature subsets. In this paper, however, we consider the unsupervised feature selection problem. Without the label information, it is particularly difficult to identify a small set of relevant features due to the noisy nature of real-world data which corrupts the intrinsic structure of the data. Our Gradient-based Laplacian Feature Selection (GLFS) selects important features by minimizing the variance of the Laplacian regularized least squares regression model. With $\ell_1$ relaxation, GLFS can find a sparse subset of features that is relevant to the Laplacian manifolds. Extensive experiments on simulated, three real-world object recognition and two computational biology datasets, have illustrated the power and superior performance of our approach over multiple state-of-the-art unsupervised feature selection methods. Additionally, we show that GLFS selects a sparser set of more relevant features in a supervised setting outperforming the popular elastic net methodology.