CLAIMar 2, 2025

Output Length Effect on DeepSeek-R1's Safety in Forced Thinking

arXiv:2503.01923v18 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses safety vulnerabilities in LLMs under adversarial conditions, proposing incremental adjustments for specific scenarios.

The study investigated how output length affects the safety of DeepSeek-R1 in adversarial Forced Thinking scenarios, finding that longer outputs can improve safety through self-correction but are vulnerable to certain attacks, leading to a recommendation for dynamic control to balance reasoning and security.

Large Language Models (LLMs) have demonstrated strong reasoning capabilities, but their safety under adversarial conditions remains a challenge. This study examines the impact of output length on the robustness of DeepSeek-R1, particularly in Forced Thinking scenarios. We analyze responses across various adversarial prompts and find that while longer outputs can improve safety through self-correction, certain attack types exploit extended generations. Our findings suggest that output length should be dynamically controlled to balance reasoning effectiveness and security. We propose reinforcement learning-based policy adjustments and adaptive token length regulation to enhance LLM safety.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes