LGApr 17, 2025

iHHO-SMOTe: A Cleansed Approach for Handling Outliers and Reducing Noise to Improve Imbalanced Data Classification

arXiv:2504.12850v13 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses imbalanced data classification for machine learning practitioners, but it is incremental as it builds on SMOTE with noise reduction techniques.

The paper tackles the problem of noise and outliers in imbalanced data classification by proposing iHHO-SMOTe, which cleanses data before oversampling, resulting in AUC scores exceeding 0.99 and F1-scores consistently above 0.967.

Classifying imbalanced datasets remains a significant challenge in machine learning, particularly with big data where instances are unevenly distributed among classes, leading to class imbalance issues that impact classifier performance. While Synthetic Minority Over-sampling Technique (SMOTE) addresses this challenge by generating new instances for the under-represented minority class, it faces obstacles in the form of noise and outliers during the creation of new samples. In this paper, a proposed approach, iHHO-SMOTe, which addresses the limitations of SMOTE by first cleansing the data from noise points. This process involves employing feature selection using a random forest to identify the most valuable features, followed by applying the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to detect outliers based on the selected features. The identified outliers from the minority classes are then removed, creating a refined dataset for subsequent oversampling using the hybrid approach called iHHO-SMOTe. The comprehensive experiments across diverse datasets demonstrate the exceptional performance of the proposed model, with an AUC score exceeding 0.99, a high G-means score of 0.99 highlighting its robustness, and an outstanding F1-score consistently exceeding 0.967. These findings collectively establish Cleansed iHHO-SMOTe as a formidable contender in addressing imbalanced datasets, focusing on noise reduction and outlier handling for improved classification models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes