CLAILGApr 29, 2025

Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption

arXiv:2504.20769v14 citationsh-index: 49
Originality Incremental advance
AI Analysis

This addresses the issue of vulnerability to adversarial attacks in LLMs for users relying on accurate information retrieval, though it is incremental as it builds on existing chain-of-thought methods.

The paper tackled the problem of improving robustness in large language models against reference corruption, such as prompt injection attacks, by introducing chain-of-defensive-thought prompting, which maintained GPT-4o's accuracy at 50% compared to a drop to 3% with standard prompting in the Natural Questions task.

Chain-of-thought prompting has demonstrated great success in facilitating the reasoning abilities of large language models. In this work, we explore how these enhanced reasoning abilities can be exploited to improve the robustness of large language models in tasks that are not necessarily reasoning-focused. In particular, we show how a wide range of large language models exhibit significantly improved robustness against reference corruption using a simple method called chain-of-defensive-thought, where only a few exemplars with structured and defensive reasoning are provided as demonstrations. Empirically, the improvements can be astounding, especially given the simplicity and applicability of the method. For example, in the Natural Questions task, the accuracy of GPT-4o degrades from 60% to as low as 3% with standard prompting when 1 out of 10 references provided is corrupted with prompt injection attacks. In contrast, GPT-4o using chain-of-defensive-thought prompting maintains an accuracy of 50%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes