Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents
This work addresses the problem of optimizing reasoning paradigms for LLM agents across diverse tasks, offering a learned routing solution that outperforms fixed paradigms, though it is incremental in improving existing methods.
The study investigates whether performance gains in LLM agents come from the model or the reasoning paradigm, finding that no single paradigm dominates across tasks and that per-task selection yields significant improvements. They propose a lightweight embedding-based router that selects the optimal paradigm per task, increasing average accuracy from 47.6% to 53.1% and recovering up to 37% of the oracle gap.
When an LLM-based agent improves on a task, is the gain from the model itself or from the reasoning paradigm wrapped around it? We study this question by comparing six inference-time paradigms, namely Direct, CoT, ReAct, Plan-Execute, Reflection, and ReCode, across four frontier LLMs and ten benchmarks, yielding roughly 18,000 runs. We find that reasoning structure helps dramatically on some tasks but hurts on others: ReAct improves over Direct by 44pp on GAIA, while CoT degrades performance by 15pp on HumanEval. No single paradigm dominates, and oracle per-task selection beats the best fixed paradigm by 17.1pp on average. Motivated by this complementarity, we propose a select-then-solve approach: before answering each task, a lightweight embedding-based router selects the most suitable paradigm. Across four models, the router improves average accuracy from 47.6% to 53.1%, outperforming the best fixed paradigm at 50.3% by 2.8pp and recovering up to 37% of the oracle gap. In contrast, zero-shot self-routing only works for GPT-5 at 67.1% and fails for weaker models, all trailing the learned router. Our results argue that reasoning paradigm selection should be a per-task decision made by a learned router, not a fixed architectural choice.