Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning
This addresses the need for better reasoning benchmarks in AI research, though it is incremental as it builds on existing RLVR concepts with a new environment.
The paper tackles the problem of advancing symbolic reasoning in LLMs by introducing Reasoning Core, a scalable RL environment with verifiable rewards that procedurally generates problems across formal domains, and initial zero-shot evaluations show it is difficult for frontier LLMs, positioning it as a resource for improvement.
We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks that focus on games or isolated puzzles, Reasoning Core procedurally generates problems across core formal domains, including PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, and system equation solving. The environment is built on key design principles of high-generality problem distributions, verification via external tools, and continuous difficulty control, which together provide a virtually infinite supply of novel training instances. Initial zero-shot evaluations with frontier LLMs confirm the difficulty of Reasoning Core's tasks, positioning it as a promising resource to improve the reasoning capabilities of future models.