CLMay 28

Spurious Prompts: Can Irrelevant Prompts Steer Large Language Models?

arXiv:2605.2967885.7Has Code
Predicted impact top 56% in CL · last 90 daysOriginality Incremental advance
AI Analysis

It reveals a new vulnerability in LLMs where irrelevant prompts can systematically steer behavior, raising concerns for safety and reliability.

The paper shows that prompts semantically unrelated to a task can steer large language models, often matching or outperforming standard prompting baselines and task-aware prompt optimization across reasoning and QA benchmarks with models from 0.8B to 27B parameters.

Large language models are highly sensitive to prompts, but this sensitivity is usually studied through task-relevant instructions, demonstrations, or reasoning cues. In this paper, we study a different form of prompt sensitivity: whether prompts that are semantically unrelated to the task can nevertheless steer model behavior. We call them spurious prompts and show their surprising efficacy. We also propose a simple black-box search procedure for discovering them. Across reasoning and question-answering benchmarks, using models ranging from 0.8B to 27B parameters and spanning three model families, we show that spurious prompts can improve performance, often matching or outperforming standard prompting baselines and task-aware prompt optimization. We further show that they can steer models toward unintended behaviors, such as repeatedly selecting the first answer option, producing incorrect answers, returning an even, prime or small number without explicitly instructing the model to do so. These findings reveal a new kind of prompt sensitivity: LLMs can be systematically steered by prompts that are unrelated to the task they are asked to solve. Our code is available at https://github.com/Batorskq/spurious

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes