AIJun 28, 2025

Improving Rationality in the Reasoning Process of Language Models through Self-playing Game

arXiv:2506.22920v26 citationsh-index: 11ICML
Originality Incremental advance
AI Analysis

This addresses the issue of improving reasoning comprehension in language models for applications requiring reliable stepwise logic, though it is incremental as it builds on existing self-play methods.

The paper tackles the problem of large language models lacking true comprehension of their reasoning processes by introducing a self-play game called Critic-Discernment Game (CDG) to enhance rationality without supervision, resulting in significant improvements in tasks like mathematical reasoning and long-chain reasoning.

Large language models (LLMs) have demonstrated considerable reasoning abilities in various tasks such as mathematics and coding. However, recent studies indicate that even the best models lack true comprehension of their reasoning processes. In this paper, we explore how self-play can enhance the rationality of models in the reasoning process without supervision from humans or superior models. We design a Critic-Discernment Game(CDG) in which a prover first provides a solution to a given problem and is subsequently challenged by critiques of its solution. These critiques either aim to assist or mislead the prover. The objective of the prover is to maintain the correct answer when faced with misleading comments, while correcting errors in response to constructive feedback. Our experiments on tasks involving mathematical reasoning, stepwise error detection, self-correction, and long-chain reasoning demonstrate that CDG training can significantly improve the ability of well-aligned LLMs to comprehend their reasoning process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes