CL LGAug 8, 2023

Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?"

Bruno Machado Carneiro, Michele Linardi, Julien Longhi

arXiv:2308.04180v13 citationsh-index: 6Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of improving SUD detection for online content moderation, but it is incremental as it focuses on dataset creation and analysis rather than new methods.

The authors tackled the problem of Socially Unacceptable Discourse (SUD) classification by building a novel corpus with manually annotated texts from diverse online sources to test the generalization ability of SUD classifiers across different contexts, resulting in insights into annotation influences and open research directions.

We study Socially Unacceptable Discourse (SUD) characterization and detection in online text. We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources used so far in state-of-the-art Machine learning (ML) SUD detection solutions. This global context allows us to test the generalization ability of SUD classifiers that acquire knowledge around the same SUD categories, but from different contexts. From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning by discussing open challenges and open research directions. We also provide several data insights which can support domain experts in the annotation task.

View on arXiv PDF Code

Similar