LG MLJan 27, 2025

Enhancing Synthetic Oversampling for Imbalanced Datasets Using Proxima-Orion Neighbors and q-Gaussian Weighting Technique

Pankaj Yadav, Vivek Vijay, Gulshan Sihag

arXiv:2501.15790v17.11 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses the challenge of imbalanced datasets in machine learning, which is a common issue in domains like medical diagnosis, but it appears to be an incremental improvement over existing oversampling methods.

The authors tackled the problem of imbalanced datasets by proposing a novel oversampling algorithm that uses Proxima-Orion neighbors and q-Gaussian weighting to generate synthetic minority class instances, and they demonstrated improved classification performance in experiments on 50 datasets, including a real-world medical dataset for sarcopenia in Indian patients.

In this article, we propose a novel oversampling algorithm to increase the number of instances of minority class in an imbalanced dataset. We select two instances, Proxima and Orion, from the set of all minority class instances, based on a combination of relative distance weights and density estimation of majority class instances. Furthermore, the q-Gaussian distribution is used as a weighting mechanism to produce new synthetic instances to improve the representation and diversity. We conduct a comprehensive experiment on 42 datasets extracted from KEEL software and eight datasets from the UCI ML repository to evaluate the usefulness of the proposed (PO-QG) algorithm. Wilcoxon signed-rank test is used to compare the proposed algorithm with five other existing algorithms. The test results show that the proposed technique improves the overall classification performance. We also demonstrate the PO-QG algorithm to a dataset of Indian patients with sarcopenia.

View on arXiv PDF

Similar