LGJan 7, 2025

Neighbor displacement-based enhanced synthetic oversampling for multiclass imbalanced data

arXiv:2501.04099v11 citationsh-index: 4Computers, Materials & Continua
Originality Highly original
AI Analysis

This addresses data imbalance in practical applications, though it appears incremental as a hybrid method building on existing oversampling techniques.

The paper tackles the problem of imbalanced multiclass datasets by proposing NDESO, a hybrid oversampling method that displaces noisy data points and performs random oversampling. The method outperformed 14 alternatives on nine classifiers across synthetic and 20 real-world datasets, achieving the highest average G-mean score and lowest statistical mean rank.

Imbalanced multiclass datasets pose challenges for machine learning algorithms. These datasets often contain minority classes that are important for accurate prediction. Existing methods still suffer from sparse data and may not accurately represent the original data patterns, leading to noise and poor model performance. A hybrid method called Neighbor Displacement-based Enhanced Synthetic Oversampling (NDESO) is proposed in this paper. This approach uses a displacement strategy for noisy data points, computing the average distance to their neighbors and moving them closer to their centroids. Random oversampling is then performed to achieve dataset balance. Extensive evaluations compare 14 alternatives on nine classifiers across synthetic and 20 real-world datasets with varying imbalance ratios. The results show that our method outperforms its competitors regarding average G-mean score and achieves the lowest statistical mean rank. This highlights its superiority and suitability for addressing data imbalance in practical applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes