CRLGOct 24, 2025

Soft Instruction De-escalation Defense

arXiv:2510.21057v13 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in LLM agents interacting with untrusted data, though it is incremental as it raises the bar but is not infallible.

The paper tackles the problem of prompt injections in tool-augmented LLM agents by proposing SIC, a soft instruction control method that iteratively sanitizes inputs, achieving a 15% attack success rate against strong adversaries.

Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment; this makes them susceptible to prompt injections when dealing with untrusted data. To overcome this limitation, we propose SIC (Soft Instruction Control)-a simple yet effective iterative prompt sanitization loop designed for tool-augmented LLM agents. Our method repeatedly inspects incoming data for instructions that could compromise agent behavior. If such content is found, the malicious content is rewritten, masked, or removed, and the result is re-evaluated. The process continues until the input is clean or a maximum iteration limit is reached; if imperative instruction-like content remains, the agent halts to ensure security. By allowing multiple passes, our approach acknowledges that individual rewrites may fail but enables the system to catch and correct missed injections in later steps. Although immediately useful, worst-case analysis shows that SIC is not infallible; strong adversary can still get a 15% ASR by embedding non-imperative workflows. This nonetheless raises the bar.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes