Method Drift›LLM reasoning / chain-of-thought

Superseded baseline#289 of 772 most-superseded

external CoT monitoring

LLM reasoning / chain-of-thought

superseded — cited as a baseline and beaten by newer methods

1 papers critique it · 0 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites external CoT monitoring as a baseline.

“However, a critical concern is that if models are directly penalized for showing deceptive thoughts, they might hide those thoughts rather than truly abandoning the deception.”
— Mitigating Deceptive Alignment via Self-Monitoring

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.

rePIRL rePIRL: Learn PRM with Inverse RL for LLM Reasoning
May 19, 2026
CoRD Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding
May 4, 2026