Method Drift›LLM reasoning / chain-of-thought
H-CoT
H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash ThinkingLLM reasoning / chain-of-thought · first seen Feb 18, 2025
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites H-CoT as a baseline.
“it remains ineffective on the latest o3 and o4-Mini”
— Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts“While effective in some cases, such approaches suffer from three major limitations. First, their reliance on fixed templates restricts diversity, making attacks easier to detect or defend against. Second, they lack adaptability to different models and contexts, limiting their robustness. Third, their overall effectiveness is constrained, as static designs fail to fully exploit the dynamic nature of CoT reasoning.”
— Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs
Beaten on benchmarks
Head-to-head results where a newer method reports beating H-CoT. Values are copied from the source paper's tables — verify against the cited paper.
- Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
DH-CoT-D12 beats H-CoT · ASR [GPT-5]
0.90 vs 0.54
- Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
DH-CoT-D12 beats H-CoT · ASR [GPT-5.1]
0.92 vs 0.84
- Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
DH-CoT-D12 beats H-CoT · ASR [o3]
0.40 vs 0.20
- Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
DH-CoT-D12 beats H-CoT · ASR [o4-Mini]
0.58 vs 0.40
- Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
DH-CoT-D12 beats H-CoT · ASR [Gemini-2.5-Flash-Thinking]
1.00 vs 0.98
- Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
DH-CoT-D12 beats H-CoT · ASR [Claude-3-7-Sonnet-Thinking]
0.82 vs 0.14
- Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
DH-CoT-D12 beats H-CoT · ASR [Claude-Sonnet-4-Thinking]
0.20 vs 0.08
- Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
DH-CoT-D12 beats H-CoT · ASR [DeepSeek-R1]
1.00 vs 0.56
- Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs
AE-CoT (ours) beats H-CoT · ASR [o1-mini]
92 vs 54
- Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs
AE-CoT (ours) beats H-CoT · HS [o1-mini]
70.4 vs 60
- Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs
AE-CoT (ours) beats H-CoT · ASR [o3-mini]
88 vs 86
- Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs
AE-CoT (ours) beats H-CoT · HS [o3-mini]
72.0 vs 70.4
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 23, 2026