Left, Right, or Center? Evaluating LLM Framing in News Classification and Generation
This addresses concerns about political bias in AI-generated journalism, highlighting systematic centrist framing tendencies that could impact media and public perception, though it is incremental in evaluating existing models.
The study investigated political framing in LLM-generated news summaries by evaluating nine state-of-the-art models, finding pervasive ideological center-collapse in both classification and generation tasks, with Grok 4 identified as the most ideologically expressive generator and Claude Sonnet 4.5 and Llama 3.1 achieving top bias-rating performance.
Large Language Model (LLM) based summarization and text generation are increasingly used for producing and rewriting text, raising concerns about political framing in journalism where subtle wording choices can shape interpretation. Across nine state-of-the-art LLMs, we study political framing by testing whether LLMs' classification-based bias signals align with framing behavior in their generated summaries. We first compare few-shot ideology predictions against LEFT/CENTER/RIGHT labels. We then generate "steered" summaries under FAITHFUL, CENTRIST, LEFT, and RIGHT prompts, and score all outputs using a single fixed ideology evaluator. We find pervasive ideological center-collapse in both article-level ratings and generated text, indicating a systematic tendency toward centrist framing. Among evaluated models, Grok 4 is by far the most ideologically expressive generator, while Claude Sonnet 4.5 and Llama 3.1 achieve the strongest bias-rating performance among commercial and open-weight models, respectively.