Sparse Feature Selection in Kernel Discriminant Analysis via Optimal Scoring
This work addresses classification accuracy and interpretability for data analysts, but it is incremental as it builds on existing kernel and optimal scoring methods.
The paper tackles the two-group classification problem by proposing a kernel classifier with theoretical guarantees on expected risk consistency and structured sparsity for feature selection, demonstrating superior classification performance in numerical studies.
We consider the two-group classification problem and propose a kernel classifier based on the optimal scoring framework. Unlike previous approaches, we provide theoretical guarantees on the expected risk consistency of the method. We also allow for feature selection by imposing structured sparsity using weighted kernels. We propose fully-automated methods for selection of all tuning parameters, and in particular adapt kernel shrinkage ideas for ridge parameter selection. Numerical studies demonstrate the superior classification performance of the proposed approach compared to existing nonparametric classifiers.