Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation
This addresses the need for interpretability in safety-critical deployments of agents, such as robots, by providing a model-agnostic explanation method, though it is incremental as it builds on existing LLM and behavior representation techniques.
The paper tackles the problem of generating natural language explanations for intelligent agents' behavior using only observed states and actions, independent of the underlying model. It demonstrates in a multi-agent search-and-rescue environment that the approach produces explanations as helpful as those from a human expert, enabling user interactions like clarification and counterfactual queries.
Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts; however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions, thus making our method independent from the underlying model's representation. For such models, we first learn a behavior representation and subsequently use it to produce plausible explanations with minimal hallucination while affording user interaction with a pre-trained large language model. We evaluate our method in a multi-agent search-and-rescue environment and demonstrate the effectiveness of our explanations for agents executing various behaviors. Through user studies and empirical experiments, we show that our approach generates explanations as helpful as those produced by a human domain expert while enabling beneficial interactions such as clarification and counterfactual queries.