CLAIFeb 11, 2025

Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?

arXiv:2502.07963v34 citationsh-index: 28CHIL
Originality Incremental advance
AI Analysis

This addresses a problem for clinicians and patients by showing that LLMs, used to synthesize medical evidence, are vulnerable to biased reporting, potentially affecting healthcare decisions.

The study investigated whether Large Language Models (LLMs) are susceptible to spin in medical literature abstracts, finding that 22 evaluated LLMs were more affected by spin than humans and could propagate it into outputs like plain language summaries, though they could recognize spin and be prompted to mitigate its impact.

Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin's impact on LLM outputs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes