UBC-NLP at SemEval-2019 Task 6:Ensemble Learning of Offensive Content With Enhanced Training Data
This work addresses the challenge of detecting offensive content on social media, which is important for moderation and safety, but it is incremental as it applies existing methods to a specific competition.
The paper tackled the problem of learning offensive content on Twitter with limited and imbalanced data by using data enhancement methods and ensemble classifiers, achieving 6th place with a 0.706 macro F1-score in one sub-task and 9th with 0.587 in another.
We examine learning offensive content on Twitter with limited, imbalanced data. For the purpose, we investigate the utility of using various data enhancement methods with a host of classical ensemble classifiers. Among the 75 participating teams in SemEval-2019 sub-task B, our system ranks 6th (with 0.706 macro F1-score). For sub-task C, among the 65 participating teams, our system ranks 9th (with 0.587 macro F1-score).