When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems
Addresses a latent failure mode in LLM-based multi-agent planning for practitioners, offering a moderate improvement over baselines.
LLM-based multi-agent systems can fail due to epistemic miscalibration during planning, where agents misjudge their knowledge even when actions execute correctly. The proposed EPC-AW workflow improves system-level success by an average of 9.75% by selecting plans with stable evaluations across agents and refining calibration over time.
LLM-based multi-agent systems can fail even when planned actions are executed correctly because agents may misjudge their knowledge when evaluating plan feasibility, a phenomenon we term epistemic miscalibration in planning. Unlike execution errors, epistemic miscalibration is latent during planning, as generated plans can remain self-consistent and executable without observable errors; the miscalibration is also dynamic, as new information can alter feasibility assessments, potentially obscuring past miscalibration signals and causing them to recur over time. To address this, we propose the Epistemic Planning Calibration Agentic Workflow (EPC-AW), which assesses whether plans remain supported under varying information conditions rather than directly verifying feasibility. EPC-AW employs Information-consistency-based Plan Selection, selecting plans whose evaluations are stable across agents, together with Consistency-guided Epistemic State Refinement to adapt calibration over time by leveraging past discrepancies to guide future planning. Experiments show that EPC-AW improves system-level success by an average of 9.75%.