Trigger Warnings: Bootstrapping a Violence Detector for FanFiction
This addresses the need for automated content moderation in fanfiction communities, though it is incremental as it applies existing methods to a new domain.
The paper tackled the problem of automatically assigning violence trigger warnings to fanfiction by creating the first dataset from Archive of Our Own and defining a binary classification task, achieving F1 scores from 0.585 to 0.798 with SVM and BERT models.
We present the first dataset and evaluation results on a newly defined computational task of trigger warning assignment. Labeled corpus data has been compiled from narrative works hosted on Archive of Our Own (AO3), a well-known fanfiction site. In this paper, we focus on the most frequently assigned trigger type--violence--and define a document-level binary classification task of whether or not to assign a violence trigger warning to a fanfiction, exploiting warning labels provided by AO3 authors. SVM and BERT models trained in four evaluation setups on the corpora we compiled yield $F_1$ results ranging from 0.585 to 0.798, proving the violence trigger warning assignment to be a doable, however, non-trivial task.