LGJun 7, 2023

Sparse Linear Centroid-Encoder: A Convex Method for Feature Selection

Tomojit Ghosh, Michael Kirby, Karim Karimov

arXiv:2306.04824v22.0h-index: 22

Originality Incremental advance

AI Analysis

This addresses feature selection for high-dimensional data like biological datasets, but it is incremental as it builds on existing linear methods with a new convex formulation.

The paper tackles the problem of feature selection in multi-class data by proposing Sparse Linear Centroid-Encoder (SLCE), a convex method that reconstructs points as class centroids with an ℓ₁-norm penalty, and shows it outperforms some state-of-the-art neural network-based techniques in experiments.

We present a novel feature selection technique, Sparse Linear Centroid-Encoder (SLCE). The algorithm uses a linear transformation to reconstruct a point as its class centroid and, at the same time, uses the $\ell_1$-norm penalty to filter out unnecessary features from the input data. The original formulation of the optimization problem is nonconvex, but we propose a two-step approach, where each step is convex. In the first step, we solve the linear Centroid-Encoder, a convex optimization problem over a matrix $A$. In the second step, we only search for a sparse solution over a diagonal matrix $B$ while keeping $A$ fixed. Unlike other linear methods, e.g., Sparse Support Vector Machines and Lasso, Sparse Linear Centroid-Encoder uses a single model for multi-class data. We present an in-depth empirical analysis of the proposed model and show that it promotes sparsity on various data sets, including high-dimensional biological data. Our experimental results show that SLCE has a performance advantage over some state-of-the-art neural network-based feature selection techniques.

View on arXiv PDF

Similar