syrapropa at SemEval-2020 Task 11: BERT-based Models Design For Propagandistic Technique and Span Detection
This work addresses the problem of identifying propaganda in news for NLP researchers, but it is incremental as it applies existing BERT methods to a specific competition task.
The paper tackled the detection of propaganda techniques in news articles by developing BERT-based models for span identification and technique classification, achieving an F1-measure of 0.4711 (seventh place) and 0.6783 (third place) on the development set, respectively.
This paper describes the BERT-based models proposed for two subtasks in SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. We first build the model for Span Identification (SI) based on SpanBERT, and facilitate the detection by a deeper model and a sentence-level representation. We then develop a hybrid model for the Technique Classification (TC). The hybrid model is composed of three submodels including two BERT models with different training methods, and a feature-based Logistic Regression model. We endeavor to deal with imbalanced dataset by adjusting cost function. We are in the seventh place in SI subtask (0.4711 of F1-measure), and in the third place in TC subtask (0.6783 of F1-measure) on the development set.