CLAILGApr 11, 2025

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

arXiv:2504.08672v124 citationsh-index: 11Has CodeACL
Originality Highly original
AI Analysis

This addresses the scalability and cost issues in post-training for LLM reasoning, offering a generalizable solution for unsupervised self-improvement.

The authors tackled the problem of scaling LLM reasoning without external supervision by introducing Genius, a purely unsupervised self-training framework that uses stepwise foresight re-sampling and advantage-calibrated optimization, achieving improved reasoning performance without annotation costs.

Advancing LLM reasoning skills has captivated wide interest. However, current post-training techniques rely heavily on supervisory signals, such as outcome supervision or auxiliary reward models, which face the problem of scalability and high annotation costs. This motivates us to enhance LLM reasoning without the need for external supervision. We introduce a generalizable and purely unsupervised self-training framework, named Genius. Without external auxiliary, Genius requires to seek the optimal response sequence in a stepwise manner and optimize the LLM. To explore the potential steps and exploit the optimal ones, Genius introduces a stepwise foresight re-sampling strategy to sample and estimate the step value by simulating future outcomes. Further, we recognize that the unsupervised setting inevitably induces the intrinsic noise and uncertainty. To provide a robust optimization, we propose an advantage-calibrated optimization (ACO) loss function to mitigate estimation inconsistencies. Combining these techniques together, Genius provides an advanced initial step towards self-improve LLM reasoning with general queries and without supervision, revolutionizing reasoning scaling laws given the vast availability of general queries. The code will be released at https://github.com/xufangzhi/Genius.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes