Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour
This work addresses trust concerns in autonomous multi-agent systems by providing explainability, though it is incremental as it builds on existing counterfactual and LLM methods for a specific domain.
The paper tackles the problem of explaining multi-agent behavior in autonomous systems by proposing AXIS, which integrates counterfactual simulations with language models to generate human-centered action explanations, resulting in at least a 7.7% improvement in perceived explanation correctness and a 23% increase in goal prediction accuracy across most models.
Autonomous multi-agent systems (MAS) are useful for automating complex tasks but raise trust concerns due to risks such as miscoordination or goal misalignment. Explainability is vital for users' trust calibration, but explainable MAS face challenges due to complex environments, the human factor, and non-standardised evaluation. Leveraging the counterfactual effect size model and LLMs, we propose Agentic eXplanations via Interrogative Simulation (AXIS). AXIS generates human-centred action explanations for multi-agent policies by having an LLM interrogate an environment simulator using prompts like 'whatif' and 'remove' to observe and synthesise counterfactual information over multiple rounds. We evaluate AXIS on autonomous driving across ten scenarios for five LLMs with a comprehensive methodology combining robustness, subjective preference, correctness, and goal/action prediction with an external LLM as evaluator. Compared to baselines, AXIS improves perceived explanation correctness by at least 7.7% across all models and goal prediction accuracy by 23% for four models, with comparable action prediction accuracy, achieving the highest scores overall. Our code is open-sourced at https://github.com/gyevnarb/axis.