High-dimensional classification by sparse logistic regression
This work addresses high-dimensional classification challenges for data scientists, offering a computationally efficient solution with theoretical guarantees, though it appears incremental as it builds on existing penalized regression and Slope estimator frameworks.
The paper tackles high-dimensional binary classification by proposing a sparse logistic regression method with a complexity penalty for model selection, deriving non-asymptotic bounds for misclassification excess risk that improve under low-noise conditions, and extends the Slope estimator to achieve computational feasibility and rate-optimality under specific conditions.
We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic bounds for the resulting misclassification excess risk. The bounds can be reduced under the additional low-noise condition. The proposed complexity penalty is remarkably related to the VC-dimension of a set of sparse linear classifiers. Implementation of any complexity penalty-based criterion, however, requires a combinatorial search over all possible models. To find a model selection procedure computationally feasible for high-dimensional data, we extend the Slope estimator for logistic regression and show that under an additional weighted restricted eigenvalue condition it is rate-optimal in the minimax sense.