LGOct 14, 2025

Self-Verifying Reflection Helps Transformers with CoT Reasoning

Zhongwei Yu, Wannian Xia, Xue Yan, Bo Xu, Haifeng Zhang, Yali Du, Jun Wang

arXiv:2510.12157v111.43 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the problem of understanding and enhancing reasoning in transformers for AI researchers, though it is incremental as it builds on existing reflection methods with a simplified approach.

The paper tackles the unclear contribution of self-verifying reflection in chain-of-thought reasoning by proposing a minimalistic framework for small transformers, showing that it guarantees improvements with bounded verification errors and enables tiny transformers to achieve LLM-level performance in tasks like integer multiplication and Sudoku.

Advanced large language models (LLMs) frequently reflect in reasoning chain-of-thoughts (CoTs), where they self-verify the correctness of current solutions and explore alternatives. However, given recent findings that LLMs detect limited errors in CoTs, how reflection contributes to empirical improvements remains unclear. To analyze this issue, in this paper, we present a minimalistic reasoning framework to support basic self-verifying reflection for small transformers without natural language, which ensures analytic clarity and reduces the cost of comprehensive experiments. Theoretically, we prove that self-verifying reflection guarantees improvements if verification errors are properly bounded. Experimentally, we show that tiny transformers, with only a few million parameters, benefit from self-verification in both training and reflective execution, reaching remarkable LLM-level performance in integer multiplication and Sudoku. Similar to LLM results, we find that reinforcement learning (RL) improves in-distribution performance and incentivizes frequent reflection for tiny transformers, yet RL mainly optimizes shallow statistical patterns without faithfully reducing verification errors. In conclusion, integrating generative transformers with discriminative verification inherently facilitates CoT reasoning, regardless of scaling and natural language.

View on arXiv PDF

Similar