CLMay 28

Spurious Prompts: Can Irrelevant Prompts Steer Large Language Models?

Pawel Batorski, Abtin Pourhadi, Jerzy Sarosiek, Przemyslaw Spurek, Paul Swoboda

arXiv:2605.2967885.7Has Code

Predicted impact top 56% in CL · last 90 daysOriginality Incremental advance

AI Analysis

It reveals a new vulnerability in LLMs where irrelevant prompts can systematically steer behavior, raising concerns for safety and reliability.

The paper shows that prompts semantically unrelated to a task can steer large language models, often matching or outperforming standard prompting baselines and task-aware prompt optimization across reasoning and QA benchmarks with models from 0.8B to 27B parameters.

Large language models are highly sensitive to prompts, but this sensitivity is usually studied through task-relevant instructions, demonstrations, or reasoning cues. In this paper, we study a different form of prompt sensitivity: whether prompts that are semantically unrelated to the task can nevertheless steer model behavior. We call them spurious prompts and show their surprising efficacy. We also propose a simple black-box search procedure for discovering them. Across reasoning and question-answering benchmarks, using models ranging from 0.8B to 27B parameters and spanning three model families, we show that spurious prompts can improve performance, often matching or outperforming standard prompting baselines and task-aware prompt optimization. We further show that they can steer models toward unintended behaviors, such as repeatedly selecting the first answer option, producing incorrect answers, returning an even, prime or small number without explicitly instructing the model to do so. These findings reveal a new kind of prompt sensitivity: LLMs can be systematically steered by prompts that are unrelated to the task they are asked to solve. Our code is available at https://github.com/Batorskq/spurious

View on arXiv PDF Code

Similar