CLAILGSep 26, 2025

Variational Reasoning for Language Models

arXiv:2509.22637v24 citationsh-index: 19Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing reasoning capabilities in language models, which is crucial for applications requiring complex problem-solving, though it appears incremental as it builds on existing variational inference and RL methods.

The authors tackled the problem of improving language model reasoning by introducing a variational reasoning framework that treats thinking traces as latent variables and optimizes them through variational inference. They empirically validated their method on Qwen 2.5 and Qwen 3 models across reasoning tasks, showing stable objectives and improved reasoning ability.

We introduce a variational reasoning framework for language models that treats thinking traces as latent variables and optimizes them through variational inference. Starting from the evidence lower bound (ELBO), we extend it to a multi-trace objective for tighter bounds and propose a forward-KL formulation that stabilizes the training of the variational posterior. We further show that rejection sampling finetuning and binary-reward RL, including GRPO, can be interpreted as local forward-KL objectives, where an implicit weighting by model accuracy naturally arises from the derivation and reveals a previously unnoticed bias toward easier questions. We empirically validate our method on the Qwen 2.5 and Qwen 3 model families across a wide range of reasoning tasks. Overall, our work provides a principled probabilistic perspective that unifies variational inference with RL-style methods and yields stable objectives for improving the reasoning ability of language models. Our code is available at https://github.com/sail-sg/variational-reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes