MEMLNov 5, 2021

New Hard-thresholding Rules based on Data Splitting in High-dimensional Imbalanced Classification

arXiv:2111.03306v3
Originality Highly original
AI Analysis

This addresses imbalanced classification in fields like biology and medicine, offering a novel method for a known bottleneck.

The paper tackles the problem of high-dimensional imbalanced classification by showing that linear discriminant analysis (LDA) fails due to data scarcity in the minority class, and proposes a new hard-thresholding method based on data splitting that reduces misclassification rate differences and is asymptotically optimal.

In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue is due to either a data collection process or because one class is indeed rare in a population. Imbalanced classification frequently arises in applications such as biology, medicine, engineering, and social sciences. In this paper, for the first time, we theoretically study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions. We show that due to data scarcity in one class, referred to as the minority class, and high-dimensionality of the feature space, the LDA ignores the minority class yielding a maximum misclassification rate. We then propose a new construction of hard-thresholding rules based on a data splitting technique that reduces the large difference between the misclassification rates. We show that the proposed method is asymptotically optimal. We further study two well-known sparse versions of the LDA in imbalanced cases. We evaluate the finite-sample performance of different methods using simulations and by analyzing two real data sets. The results show that our method either outperforms its competitors or has comparable performance based on a much smaller subset of selected features, while being computationally more efficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes