Xu at SemEval-2022 Task 4: Pre-BERT Neural Network Methods vs Post-BERT RoBERTa Approach for Patronizing and Condescending Language Detection
This work addresses the problem of identifying harmful language for NLP applications, but it is incremental as it compares existing methods on a specific benchmark.
The paper tackled the detection of patronizing and condescending language in SemEval-2022 Task 4, finding that pre-BERT neural network systems performed worse than RoBERTa models, with the top RoBERTa system achieving an F1-score of 54.64 in subtask 1 and 30.03 in subtask 2.
This paper describes my participation in the SemEval-2022 Task 4: Patronizing and Condescending Language Detection. I participate in both subtasks: Patronizing and Condescending Language (PCL) Identification and Patronizing and Condescending Language Categorization, with the main focus put on subtask 1. The experiments compare pre-BERT neural network (NN) based systems against post-BERT pretrained language model RoBERTa. This research finds NN-based systems in the experiments perform worse on the task compared to the pretrained language models. The top-performing RoBERTa system is ranked 26 out of 78 teams (F1-score: 54.64) in subtask 1, and 23 out of 49 teams (F1-score: 30.03) in subtask 2.