CLAug 11, 2025

Can You Trick the Grader? Adversarial Persuasion of LLM Judges

arXiv:2508.07805v17 citationsh-index: 6EMNLP
Originality Incremental advance
AI Analysis

This reveals a critical vulnerability in LLM-as-a-Judge pipelines, impacting automated evaluation systems in practical settings, though it is incremental as it builds on known issues of LLM bias.

The study tackled the problem of whether persuasive language can bias LLM judges in scoring mathematical reasoning tasks, finding that embedded persuasion techniques inflated scores for incorrect solutions by up to 8% on average, with Consistency causing the most distortion, and this vulnerability persisted despite countermeasures.

As large language models take on growing roles as automated evaluators in practical settings, a critical question arises: Can individuals persuade an LLM judge to assign unfairly high scores? This study is the first to reveal that strategically embedded persuasive language can bias LLM judges when scoring mathematical reasoning tasks, where correctness should be independent of stylistic variation. Grounded in Aristotle's rhetorical principles, we formalize seven persuasion techniques (Majority, Consistency, Flattery, Reciprocity, Pity, Authority, Identity) and embed them into otherwise identical responses. Across six math benchmarks, we find that persuasive language leads LLM judges to assign inflated scores to incorrect solutions, by up to 8% on average, with Consistency causing the most severe distortion. Notably, increasing model size does not substantially mitigate this vulnerability. Further analysis demonstrates that combining multiple persuasion techniques amplifies the bias, and pairwise evaluation is likewise susceptible. Moreover, the persuasive effect persists under counter prompting strategies, highlighting a critical vulnerability in LLM-as-a-Judge pipelines and underscoring the need for robust defenses against persuasion-based attacks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes