LGSep 30, 2025

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

MILA
arXiv:2509.26626v112 citationsh-index: 56Has Code
Originality Highly original
AI Analysis

This addresses the challenge of enhancing reasoning efficiency in LLMs for AI researchers and practitioners, representing an incremental improvement over existing test-time scaling methods.

The paper tackles the problem of improving large language model capabilities through test-time scaling by proposing Recursive Self-Aggregation (RSA), which combines parallel and sequential scaling to refine reasoning chains, resulting in substantial performance gains across diverse tasks and models, such as enabling Qwen3-4B-Instruct-2507 to achieve competitive performance with larger models like DeepSeek-R1 and o3-mini.

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes