Duluth at SemEval-2019 Task 6: Lexical Approaches to Identify and Categorize Offensive Tweets
This work addresses the problem of detecting offensive language in social media for content moderation, but it is incremental as it applies existing methods to a specific competition.
The paper describes systems for identifying and categorizing offensive tweets in SemEval-2019 Task 6, using traditional machine learning with lexical features and a rule-based black-list approach. The best systems achieved mid-rank placements: 57th of 103 in task A, 39th of 75 in task B, and 44th of 65 in task C.
This paper describes the Duluth systems that participated in SemEval--2019 Task 6, Identifying and Categorizing Offensive Language in Social Media (OffensEval). For the most part these systems took traditional Machine Learning approaches that built classifiers from lexical features found in manually labeled training data. However, our most successful system for classifying a tweet as offensive (or not) was a rule-based black--list approach, and we also experimented with combining the training data from two different but related SemEval tasks. Our best systems in each of the three OffensEval tasks placed in the middle of the comparative evaluation, ranking 57th of 103 in task A, 39th of 75 in task B, and 44th of 65 in task C.