Should I Have Expressed a Different Intent? Counterfactual Generation for LLM-Based Autonomous Control
This addresses the need for interpretability and reliability in LLM-driven agents for users, though it is incremental as it builds on existing causal modeling and conformal prediction techniques.
The paper tackles the problem of enabling users to reason about how different phrasings of their intent might have changed outcomes in LLM-based autonomous control, by introducing a framework that generates counterfactual outcomes with formal reliability guarantees. It demonstrates significant advantages over baselines in a wireless network control use case.
Large language model (LLM)-powered agents can translate high-level user intents into plans and actions in an environment. Yet after observing an outcome, users may wonder: What if I had phrased my intent differently? We introduce a framework that enables such counterfactual reasoning in agentic LLM-driven control scenarios, while providing formal reliability guarantees. Our approach models the closed-loop interaction between a user, an LLM-based agent, and an environment as a structural causal model (SCM), and leverages test-time scaling to generate multiple candidate counterfactual outcomes via probabilistic abduction. Through an offline calibration phase, the proposed conformal counterfactual generation (CCG) yields sets of counterfactual outcomes that are guaranteed to contain the true counterfactual outcome with high probability. We showcase the performance of CCG on a wireless network control use case, demonstrating significant advantages compared to naive re-execution baselines.