MLLGDec 19, 2024

Statistical Undersampling with Mutual Information and Support Points

arXiv:2412.14527v14 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses class imbalance in machine learning datasets, which is an incremental improvement for practical applications.

The paper tackled class imbalance in classification tasks by introducing two novel undersampling methods based on mutual information and support points, which outperformed traditional techniques and achieved higher balanced classification accuracy.

Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning, often leading to biased models and poor predictive performance for minority classes. This work introduces two novel undersampling approaches: mutual information-based stratified simple random sampling and support points optimization. These methods prioritize representative data selection, effectively minimizing information loss. Empirical results across multiple classification tasks demonstrate that our methods outperform traditional undersampling techniques, achieving higher balanced classification accuracy. These findings highlight the potential of combining statistical concepts with machine learning to address class imbalance in practical applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes