CLJun 12, 2021

Study of sampling methods in sentiment analysis of imbalanced data

arXiv:2106.06673v1
Originality Synthesis-oriented
AI Analysis

This work tackles the problem of imbalanced data in sentiment analysis for specific domains, but it is incremental as it applies existing sampling methods without introducing new techniques.

The study applied various sampling methods to address class imbalance in sentiment analysis on two highly imbalanced datasets, Epicurious reviews and Planned Parenthood comments, using word n-grams and information gain for feature selection.

This work investigates the application of sampling methods for sentiment analysis on two different highly imbalanced datasets. One dataset contains online user reviews from the cooking platform Epicurious and the other contains comments given to the Planned Parenthood organization. In both these datasets, the classes of interest are rare. Word n-grams were used as features from these datasets. A feature selection technique based on information gain is first applied to reduce the number of features to a manageable space. A number of different sampling methods were then applied to mitigate the class imbalance problem which are then analyzed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes