Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations
This addresses the need for interpretable AI in domains like gaming, though it is incremental as it applies existing translation methods to a new explanation task.
The paper tackles the problem of generating natural language explanations for autonomous system behavior by using neural machine translation to translate internal agent states into text, and shows that this approach produces accurate rationalizations that are more satisfying to humans than alternative methods.
We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had performed the behavior. We describe a rationalization technique that uses neural machine translation to translate internal state-action representations of an autonomous agent into natural language. We evaluate our technique in the Frogger game environment, training an autonomous game playing agent to rationalize its action choices using natural language. A natural language training corpus is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation and show the results of two experiments evaluating the effectiveness of rationalization. Results of these evaluations show that neural machine translation is able to accurately generate rationalizations that describe agent behavior, and that rationalizations are more satisfying to humans than other alternative methods of explanation.