CL AI LGOct 21, 2020

ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Soumya Suvra Ghosal, Deepak P, Anna Jurek-Loughrey

arXiv:2010.10836v10.34 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of disinformation in domains like health, especially for COVID-19, by providing a tool to pinpoint critical misleading content, though it is incremental as it builds on existing NLP techniques.

The paper tackles the problem of identifying key disinformation sentences within untrustworthy articles, proposing an unsupervised three-phase statistical NLP method that effectively identifies core disinformation based on empirical evaluation.

Disinformation is often presented in long textual articles, especially when it relates to domains such as health, often seen in relation to COVID-19. These articles are typically observed to have a number of trustworthy sentences among which core disinformation sentences are scattered. In this paper, we propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy. We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task. Sentences represented using those features are then clustered, following which the key sentences are identified through proximity scoring. We also curate a new dataset with sentence level disinformation scorings to aid evaluation for this task; the dataset is being made publicly available to facilitate further research. Based on a comprehensive empirical evaluation against techniques from related tasks such as claim detection and summarization, as well as against simplified variants of our proposed approach, we illustrate that our method is able to identify core disinformation effectively.

View on arXiv PDF

Similar