CLLGMAApr 8

ReDAct: Uncertainty-Aware Deferral for LLM Agents

arXiv:2604.0703682.0
AI Analysis

This addresses the cost-reliability tradeoff for LLM agents in applications like embodied environments, offering a practical solution for deployment.

The paper tackles the problem of LLM agents hallucinating in sequential decision-making by proposing ReDAct, which uses a small, cheap LLM by default and defers uncertain decisions to a large, expensive LLM, achieving similar quality with only 15% deferrals and significantly reducing costs.

Recently, LLM-based agents have become increasingly popular across many applications, including complex sequential decision-making problems. However, they inherit the tendency of LLMs to hallucinate, leading to incorrect decisions. In sequential settings, even a single mistake can irreversibly degrade the trajectory, making hallucinations an even bigger problem. Although larger LLMs hallucinate less, they incur a significantly higher per-token cost. In this paper, we address this tradeoff by proposing ReDAct (Reason-Defer-Act). In ReDAct, an agent is equipped with two LLMs: a small, cheap model used by default, and a large, more reliable but expensive model. When the predictive uncertainty of the small model exceeds a calibrated threshold, the decision is deferred to the large model. We evaluate our approach in text-based embodied environments such as ALFWorld and MiniGrid and show that deferring only about 15% of decisions to the large model can match the quality of using it exclusively, while significantly reducing inference costs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes