Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions
This addresses the need for explainable AI to improve human trust and understanding in autonomous systems, though it is incremental as it applies existing methods to a specific domain.
The paper tackled the problem of generating real-time explanations for autonomous agents by training a neural model on human data to produce different styles of rationales in a Frogger game, finding that participants preferred detailed rationales to build a stable mental model of the agent's behavior.
Automated rationale generation is an approach for real-time explanation generation whereby a computational model learns to translate an autonomous agent's internal state and action data representations into natural language. Training on human explanation data can enable agents to learn to generate human-like explanations for their behavior. In this paper, using the context of an agent that plays Frogger, we describe (a) how to collect a corpus of explanations, (b) how to train a neural rationale generator to produce different styles of rationales, and (c) how people perceive these rationales. We conducted two user studies. The first study establishes the plausibility of each type of generated rationale and situates their user perceptions along the dimensions of confidence, humanlike-ness, adequate justification, and understandability. The second study further explores user preferences between the generated rationales with regard to confidence in the autonomous agent, communicating failure and unexpected behavior. Overall, we find alignment between the intended differences in features of the generated rationales and the perceived differences by users. Moreover, context permitting, participants preferred detailed rationales to form a stable mental model of the agent's behavior.