LG AI CRAug 23, 2025

POT: Inducing Overthinking in LLMs via Black-Box Iterative Optimization

Xinyu Li, Tianjin Huang, Ronghui Mu, Xiaowei Huang, Gaojie Jin

arXiv:2508.19277v19 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses a novel attack surface for LLM security, enabling more practical and covert overthinking attacks without external dependencies, though it is incremental in building on prior overthinking concepts.

The paper tackles the problem of computational inefficiency in Chain-of-Thought prompting for LLMs by proposing POT, a black-box attack framework that induces overthinking, achieving superior performance compared to other methods in experiments across diverse models and datasets.

Recent advances in Chain-of-Thought (CoT) prompting have substantially enhanced the reasoning capabilities of large language models (LLMs), enabling sophisticated problem-solving through explicit multi-step reasoning traces. However, these enhanced reasoning processes introduce novel attack surfaces, particularly vulnerabilities to computational inefficiency through unnecessarily verbose reasoning chains that consume excessive resources without corresponding performance gains. Prior overthinking attacks typically require restrictive conditions including access to external knowledge sources for data poisoning, reliance on retrievable poisoned content, and structurally obvious templates that limit practical applicability in real-world scenarios. To address these limitations, we propose POT (Prompt-Only OverThinking), a novel black-box attack framework that employs LLM-based iterative optimization to generate covert and semantically natural adversarial prompts, eliminating dependence on external data access and model retrieval. Extensive experiments across diverse model architectures and datasets demonstrate that POT achieves superior performance compared to other methods.

View on arXiv PDF

Similar