An Efficient Approach to Sparse Linear Discriminant Analysis
This work addresses the need for efficient and interpretable sparse LDA methods in high-dimensional domains like genomics, though it appears incremental as it builds on existing penalized LDA and Optimal Scoring frameworks.
The paper tackles the problem of sparse Linear Discriminant Analysis (LDA) by proposing a novel approach based on penalized Optimal Scoring, which ensures exact equivalence with penalized LDA and uses a group-Lasso penalty for feature selection across all discriminant directions. The result is an efficient algorithm that generates extremely parsimonious models without compromising prediction performance, as demonstrated in experiments, and is well-suited for gene expression data.
We present a novel approach to the formulation and the resolution of sparse Linear Discriminant Analysis (LDA). Our proposal, is based on penalized Optimal Scoring. It has an exact equivalence with penalized LDA, contrary to the multi-class approaches based on the regression of class indicator that have been proposed so far. Sparsity is obtained thanks to a group-Lasso penalty that selects the same features in all discriminant directions. Our experiments demonstrate that this approach generates extremely parsimonious models without compromising prediction performances. Besides prediction, the resulting sparse discriminant directions are also amenable to low-dimensional representations of data. Our algorithm is highly efficient for medium to large number of variables, and is thus particularly well suited to the analysis of gene expression data.