CLJan 16

Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data

Xuanming Zhang, Shwan Ashrafi, Aziza Mirsaidova, Amir Rezaeian, Miguel Ballesteros, Lydia B. Chilton, Zhou Yu, Dan Roth

arXiv:2601.11038v11.11 citationsh-index: 7

Originality Highly original

AI Analysis

This addresses the practical need for efficient reasoning in real-world tasks like trip planning under budget constraints, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the problem of large language models (LLMs) reasoning under limited computation budgets by introducing an anytime reasoning framework and a self-improvement method using LLM-synthesized preference data, resulting in consistent gains in reasoning quality and efficiency across multiple models and datasets.

We study the reasoning behavior of large language models (LLMs) under limited computation budgets. In such settings, producing useful partial solutions quickly is often more practical than exhaustive reasoning, which incurs high inference costs. Many real-world tasks, such as trip planning, require models to deliver the best possible output within a fixed reasoning budget. We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase. To further enhance efficiency, we propose an inference-time self-improvement method using LLM-synthesized preference data, where models learn from their own reasoning comparisons to produce better intermediate solutions. Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models, improving both reasoning quality and efficiency under budget constraints.

View on arXiv PDF

Similar