CL LGAug 11, 2020

A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification

arXiv:2008.04636v10.323 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses class imbalance in text classification, but it is incremental as it applies existing oversampling methods to a specific domain.

The authors compared synthetic oversampling methods like SMOTE and its variants for multi-class text classification, finding that oversampling generally improves classification quality, with KNN and SVM being more affected by class imbalance than neural networks.

The authors compared oversampling methods for the problem of multi-class topic classification. The SMOTE algorithm underlies one of the most popular oversampling methods. It consists in choosing two examples of a minority class and generating a new example based on them. In the paper, the authors compared the basic SMOTE method with its two modifications (Borderline SMOTE and ADASYN) and random oversampling technique on the example of one of text classification tasks. The paper discusses the k-nearest neighbor algorithm, the support vector machine algorithm and three types of neural networks (feedforward network, long short-term memory (LSTM) and bidirectional LSTM). The authors combine these machine learning algorithms with different text representations and compared synthetic oversampling methods. In most cases, the use of oversampling techniques can significantly improve the quality of classification. The authors conclude that for this task, the quality of the KNN and SVM algorithms is more influenced by class imbalance than neural networks.

View on arXiv PDF Code

Similar