Hierarchical Multi-Instance Multi-Label Learning for Detecting Propaganda Techniques
This work addresses the problem of detecting propaganda techniques in text for NLP researchers, offering an incremental improvement by better handling label dependencies.
The paper tackles propaganda technique classification by modeling it as a Multi-Instance Multi-Label learning problem and incorporating hierarchical label dependencies, achieving a 2.47% absolute improvement in micro-F1 over the previous best model in cross-validation.
Since the introduction of the SemEval 2020 Task 11 (Martino et al., 2020a), several approaches have been proposed in the literature for classifying propaganda based on the rhetorical techniques used to influence readers. These methods, however, classify one span at a time, ignoring dependencies from the labels of other spans within the same context. In this paper, we approach propaganda technique classification as a Multi-Instance Multi-Label (MIML) learning problem (Zhou et al., 2012) and propose a simple RoBERTa-based model (Zhuang et al., 2021) for classifying all spans in an article simultaneously. Further, we note that, due to the annotation process where annotators classified the spans by following a decision tree, there is an inherent hierarchical relationship among the different techniques, which existing approaches ignore. We incorporate these hierarchical label dependencies by adding an auxiliary classifier for each node in the decision tree to the training objective and ensembling the predictions from the original and auxiliary classifiers at test time. Overall, our model leads to an absolute improvement of 2.47% micro-F1 over the model from the shared task winning team in a cross-validation setup and is the best performing non-ensemble model on the shared task leaderboard.