LGMLDec 30, 2025

Sparse classification with positive-confidence data in high dimensions

arXiv:2512.24443v1h-index: 9
Originality Incremental advance
AI Analysis

This addresses the challenge of weak supervision in high-dimensional settings for researchers and practitioners in machine learning, representing an incremental advance by adapting existing sparse techniques to a new data scenario.

The paper tackles the problem of high-dimensional classification with only positive-confidence data, proposing a sparse-penalization framework that achieves predictive performance and variable selection accuracy comparable to fully supervised methods.

High-dimensional learning problems, where the number of features exceeds the sample size, often require sparse regularization for effective prediction and variable selection. While established for fully supervised data, these techniques remain underexplored in weak-supervision settings such as Positive-Confidence (Pconf) classification. Pconf learning utilizes only positive samples equipped with confidence scores, thereby avoiding the need for negative data. However, existing Pconf methods are ill-suited for high-dimensional regimes. This paper proposes a novel sparse-penalization framework for high-dimensional Pconf classification. We introduce estimators using convex (Lasso) and non-convex (SCAD, MCP) penalties to address shrinkage bias and improve feature recovery. Theoretically, we establish estimation and prediction error bounds for the L1-regularized Pconf estimator, proving it achieves near minimax-optimal sparse recovery rates under Restricted Strong Convexity condition. To solve the resulting composite objective, we develop an efficient proximal gradient algorithm. Extensive simulations demonstrate that our proposed methods achieve predictive performance and variable selection accuracy comparable to fully supervised approaches, effectively bridging the gap between weak supervision and high-dimensional statistics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes