AIApr 17

Don't Start What You Can't Finish: A Counterfactual Audit of Support-State Triage in LLM Agents

arXiv:2604.1675242.8h-index: 7

AI Analysis

For developers of LLM agents, this work provides a structured audit to identify and mitigate overcommitment in task triage, though the results are upper-bound estimates due to single-context-window evaluation.

The paper introduces SSTA-32, a diagnostic framework to evaluate whether LLM agents can diagnose why a task is blocked before acting. Default execution overcommits on non-complete tasks (41.7% overcommitment rate), while prompting with categorical decision paths achieves 91.7% typed deferral accuracy.

Current agent evaluations largely reward execution on fully specified tasks, while recent work studies clarification [11, 22, 2], capability awareness [9, 1], abstention [8, 14], and search termination [20, 5] mostly in isolation. This leaves open whether agents can diagnose why a task is blocked before acting. We introduce the Support-State Triage Audit (SSTA-32), a matched-item diagnostic framework in which minimal counterfactual edits flip the same base request across four support states: Complete (ANSWER), Clarifiable (CLARIFY), Support-Blocked (REQUEST SUPPORT), and Unsupported-Now (ABSTAIN). We evaluate a frontier model under four prompting conditions - Direct, Action-Only, Confidence-Only, and a typed Preflight Support Check (PSC) - using Dual-Persona Auto-Auditing (DPAA) with deterministic heuristic scoring. Default execution overcommits heavily on non-complete tasks (41.7% overcommitment rate). Scalar confidence mapping avoids overcommitment but collapses the three-way deferral space (58.3% typed deferral accuracy). Conversely, both Action-Only and PSC achieve 91.7% typed deferral accuracy by surfacing the categorical ontology in the prompt. Targeted ablations confirm that removing the support-sufficiency dimension selectively degrades REQUEST SUPPORT accuracy, while removing the evidence-sufficiency dimension triggers systematic overcommitment on unsupported items. Because DPAA operates within a single context window, these results represent upper-bound capability estimates; nonetheless, the structural findings indicate that frontier models possess strong latent triage capabilities that require explicit categorical decision paths to activate safely.

View on arXiv PDF

Similar