CLJun 12, 2023

LTCR: Long-Text Chinese Rumor Detection Dataset

Ziyang Ma, Mengsha Liu, Guian Fang, Ying Shen

arXiv:2306.07201v21.35 citationsh-index: 4Has Code

Originality Synthesis-oriented

AI Analysis

This provides a dataset and model for detecting long-text Chinese rumors, particularly for COVID-19 misinformation, but it is incremental as it builds on existing rumor detection work.

The paper tackles the problem of detecting false information in long Chinese texts, especially related to COVID-19, by introducing the LTCR dataset with 1,729 real and 500 fake news items, and proposes a model that achieves 95.85% accuracy, 90.91% fake news recall, and 90.60% F-score.

False information can spread quickly on social media, negatively influencing the citizens' behaviors and responses to social events. To better detect all of the fake news, especially long texts which are harder to find completely, a Long-Text Chinese Rumor detection dataset named LTCR is proposed. The LTCR dataset provides a valuable resource for accurately detecting misinformation, especially in the context of complex fake news related to COVID-19. The dataset consists of 1,729 and 500 pieces of real and fake news, respectively. The average lengths of real and fake news are approximately 230 and 152 characters. We also propose \method, Salience-aware Fake News Detection Model, which achieves the highest accuracy (95.85%), fake news recall (90.91%) and F-score (90.60%) on the dataset. (https://github.com/Enderfga/DoubleCheck)

View on arXiv PDF Code

Similar