LGAIMLMar 29, 2018

Modified SMOTE Using Mutual Information and Different Sorts of Entropies

arXiv:1803.11002v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses data imbalance issues in machine learning, particularly for classification tasks, but it is incremental as it builds on existing SMOTE techniques.

The authors tackled the problem of imbalanced datasets by proposing four enhanced SMOTE oversampling methods that incorporate mutual information and different entropies, and they demonstrated improved accuracy on 11 datasets compared to previous methods, with a case study on transportation data showing an imbalance ratio of 36.

SMOTE is one of the oversampling techniques for balancing the datasets and it is considered as a pre-processing step in learning algorithms. In this paper, four new enhanced SMOTE are proposed that include an improved version of KNN in which the attribute weights are defined by mutual information firstly and then they are replaced by maximum entropy, Renyi entropy and Tsallis entropy. These four pre-processing methods are combined with 1NN and J48 classifiers and their performance are compared with the previous methods on 11 imbalanced datasets from KEEL repository. The results show that these pre-processing methods improves the accuracy compared with the previous stablished works. In addition, as a case study, the first pre-processing method is applied on transportation data of Tehran-Bazargan Highway in Iran with IR equal to 36.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes