CLNov 1, 2025

OpenSIR: Open-Ended Self-Improving Reasoner

arXiv:2511.00602v13 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the problem of enabling LLMs to surpass human-level performance in open-ended reasoning for AI research, though it appears incremental as it builds on self-play methods.

The paper tackles the limitation of LLM reasoning that relies on annotated datasets by introducing OpenSIR, a self-play framework where an LLM generates and solves novel problems without external supervision, resulting in improvements such as Llama-3.2-3B-Instruct advancing from 73.9 to 78.3 on GSM8K and from 28.8 to 34.4 on College Math.

Recent advances in large language model (LLM) reasoning through reinforcement learning rely on annotated datasets for verifiable rewards, which may limit models' ability to surpass human-level performance. While self-play offers a promising alternative, existing approaches depend on external verifiers or cannot learn open-endedly. We present Open-Ended Self-Improving Reasoner (OpenSIR), a self-play framework where an LLM learns to generate and solve novel problems by alternating teacher and student roles without external supervision. To generate novel problems, OpenSIR optimises for both difficulty and diversity, rewarding problems that challenge appropriately while exploring distinct concepts, enabling open-ended mathematical discovery. Starting from a single trivial seed problem, OpenSIR substantially improves instruction models: Llama-3.2-3B-Instruct advances from 73.9 to 78.3 on GSM8K, and from 28.8 to 34.4 on College Math, while Gemma-2-2B-Instruct rises from 38.5 to 58.7 on GSM8K. Our analyses reveal that OpenSIR achieves open-ended learning through co-evolving teacher-student roles that adaptively calibrate difficulty and drive diverse exploration, progressing autonomously from basic to advanced mathematics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes