Counterfactual States for Atari Agents via Generative Deep Learning
This addresses the interpretability issue in AI for non-expert users, though it is incremental as it builds on existing explanation methods.
The paper tackled the problem of explaining deep reinforcement learning agents' decisions by introducing counterfactual states for Atari games, which show minimal changes needed for different actions, and found in a user study that these states helped non-experts better understand the agent's decision-making.
Although deep reinforcement learning agents have produced impressive results in many domains, their decision making is difficult to explain to humans. To address this problem, past work has mainly focused on explaining why an action was chosen in a given state. A different type of explanation that is useful is a counterfactual, which deals with "what if?" scenarios. In this work, we introduce the concept of a counterfactual state to help humans gain a better understanding of what would need to change (minimally) in an Atari game image for the agent to choose a different action. We introduce a novel method to create counterfactual states from a generative deep learning architecture. In addition, we evaluate the effectiveness of counterfactual states on human participants who are not machine learning experts. Our user study results suggest that our generated counterfactual states are useful in helping non-expert participants gain a better understanding of an agent's decision making process.