DWReCO at CheckThat! 2023: Enhancing Subjectivity Detection through Style-based Data Sampling
This work addresses subjectivity detection for multilingual fact-checking, but it is incremental as it builds on existing methods with data augmentation.
The paper tackled class imbalance in subjectivity detection by generating additional training data using GPT-3 with style-based prompts from a journalistic checklist, and found that style-based oversampling outperformed paraphrasing in Turkish and English, while GPT-3 sometimes underperformed in non-English languages.
This paper describes our submission for the subjectivity detection task at the CheckThat! Lab. To tackle class imbalances in the task, we have generated additional training materials with GPT-3 models using prompts of different styles from a subjectivity checklist based on journalistic perspective. We used the extended training set to fine-tune language-specific transformer models. Our experiments in English, German and Turkish demonstrate that different subjective styles are effective across all languages. In addition, we observe that the style-based oversampling is better than paraphrasing in Turkish and English. Lastly, the GPT-3 models sometimes produce lacklustre results when generating style-based texts in non-English languages.