AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models
This addresses the issue of suboptimal fixed configurations in LLMs for tasks like joke generation and mathematical reasoning, offering an incremental improvement through adaptive automation.
The paper tackles the problem of LLMs needing effective configurations for tasks requiring sophisticated reasoning by introducing AdaReasoner, an LLM-agnostic plugin that automates adaptive reasoning configurations, which consistently outperforms standard baselines across six LLMs and various reasoning tasks.
LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work 'well enough' across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.