High Dimensional Classification through $\ell_0$-Penalized Empirical Risk Minimization
This addresses feature selection in high-dimensional classification, but it appears incremental as it builds on existing L0-penalized optimization methods.
The paper tackles high-dimensional binary classification by minimizing empirical misclassification risk with an L0 penalty on feature selection, showing that the method yields sparse solutions close to true sparsity with high probability and provides convergence rates for excess risk.
We consider a high dimensional binary classification problem and construct a classification procedure by minimizing the empirical misclassification risk with a penalty on the number of selected features. We derive non-asymptotic probability bounds on the estimated sparsity as well as on the excess misclassification risk. In particular, we show that our method yields a sparse solution whose l0-norm can be arbitrarily close to true sparsity with high probability and obtain the rates of convergence for the excess misclassification risk. The proposed procedure is implemented via the method of mixed integer linear programming. Its numerical performance is illustrated in Monte Carlo experiments.