CL CY HC SIMar 4, 2025

Limited Effectiveness of LLM-based Data Augmentation for COVID-19 Misinformation Stance Detection

Eun Cheol Choi, Ashwin Balasubramanian, Jinhu Qi, Emilio Ferrara

arXiv:2503.02328v19.66 citationsh-index: 6Has CodeWWW

Originality Synthesis-oriented

AI Analysis

This work addresses misinformation detection for public health, but it is incremental as it shows limited effectiveness of a new augmentation approach.

The study tested controllable misinformation generation using large language models for data augmentation in COVID-19 misinformation stance detection, finding that performance gains over traditional methods were minimal and inconsistent due to LLM safeguards.

Misinformation surrounding emerging outbreaks poses a serious societal threat, making robust countermeasures essential. One promising approach is stance detection (SD), which identifies whether social media posts support or oppose misleading claims. In this work, we finetune classifiers on COVID-19 misinformation SD datasets consisting of claims and corresponding tweets. Specifically, we test controllable misinformation generation (CMG) using large language models (LLMs) as a method for data augmentation. While CMG demonstrates the potential for expanding training datasets, our experiments reveal that performance gains over traditional augmentation methods are often minimal and inconsistent, primarily due to built-in safeguards within LLMs. We release our code and datasets to facilitate further research on misinformation detection and generation.

View on arXiv PDF

Similar