CLAILGOct 1, 2025

Training Large Language Models To Reason In Parallel With Global Forking Tokens

arXiv:2510.05132v22 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing parallel reasoning in LLMs for AI researchers and practitioners, offering an incremental improvement over existing fine-tuning methods.

The paper tackles the challenge of generating diverse and accurate reasoning paths in large language models by introducing Set Supervised Fine-Tuning (SSFT), which uses a global loss and self-supervised bipartite matching to preserve unique reasoning modes, resulting in consistent improvements over standard SFT on reasoning benchmarks as measured by Pass@1 and Cons@k metrics.

Although LLMs have demonstrated improved performance by scaling parallel test-time compute, doing so relies on generating reasoning paths that are both diverse and accurate. For challenging problems, the forking tokens that trigger diverse yet correct reasoning modes are typically deep in the sampling tree. Consequently, common strategies to encourage diversity, such as temperature scaling, encounter a worsened trade-off between diversity and accuracy. Motivated by this challenge, we treat parallel reasoning as a set-of-next-token-prediction problem, and incorporate a set-based global loss into Supervised Fine-Tuning (SFT) using self-supervised bipartite matching between our global forking tokens and unique reasoning traces. We observe that, while naive fine-tuning with multiple reasoning traces collapses these unique reasoning modes, our proposed method, Set Supervised Fine-Tuning (SSFT), preserves these modes and produces emergent global forking tokens. Experiments on multiple reasoning benchmarks show that our SSFT consistently outperforms SFT under both Pass@1 and Cons@k metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes