CLFeb 19, 2024

Understanding Fine-grained Distortions in Reports of Scientific Findings

Amelie Wührl, Dustin Wright, Roman Klinger, Isabelle Augenstein

arXiv:2402.12431v115.930 citationsh-index: 43ACL

Originality Incremental advance

AI Analysis

This addresses the issue of distorted science communication for the general public and researchers, though it is incremental as it builds on prior work with unpaired data.

The study tackled the problem of fine-grained distortions in science communication by annotating 1,600 paired instances from academic papers and public reports, establishing detection baselines, and analyzing prevalence, finding that tweets distort findings more often than news reports and that fine-tuned models outperform few-shot LLM prompting.

Distorted science communication harms individuals and society as it can lead to unhealthy behavior change and decrease trust in scientific institutions. Given the rapidly increasing volume of science communication in recent years, a fine-grained understanding of how findings from scientific publications are reported to the general public, and methods to detect distortions from the original work automatically, are crucial. Prior work focused on individual aspects of distortions or worked with unpaired data. In this work, we make three foundational contributions towards addressing this problem: (1) annotating 1,600 instances of scientific findings from academic papers paired with corresponding findings as reported in news articles and tweets wrt. four characteristics: causality, certainty, generality and sensationalism; (2) establishing baselines for automatically detecting these characteristics; and (3) analyzing the prevalence of changes in these characteristics in both human-annotated and large-scale unlabeled data. Our results show that scientific findings frequently undergo subtle distortions when reported. Tweets distort findings more often than science news reports. Detecting fine-grained distortions automatically poses a challenging task. In our experiments, fine-tuned task-specific models consistently outperform few-shot LLM prompting.

View on arXiv PDF

Similar