AIHCLGJan 28, 2020

Distal Explanations for Model-free Explainable Reinforcement Learning

arXiv:2001.10284v228 citations
AI Analysis

This work addresses the need for explainable AI in reinforcement learning, particularly for human-agent interaction, though it is incremental as it builds on existing causal and counterfactual methods.

The paper tackles the problem of generating 'why' and 'why not' explanations for model-free reinforcement learning agents by introducing a distal explanation model that uses causal models, opportunity chains, decision trees, and recurrent neural networks. The result is improved outcomes in a study with 90 human participants over three scenarios compared to two baseline models, as evaluated in 6 reinforcement learning benchmarks.

In this paper we introduce and evaluate a distal explanation model for model-free reinforcement learning agents that can generate explanations for `why' and `why not' questions. Our starting point is the observation that causal models can generate opportunity chains that take the form of `A enables B and B causes C'. Using insights from an analysis of 240 explanations generated in a human-agent experiment, we define a distal explanation model that can analyse counterfactuals and opportunity chains using decision trees and causal models. A recurrent neural network is employed to learn opportunity chains, and decision trees are used to improve the accuracy of task prediction and the generated counterfactuals. We computationally evaluate the model in 6 reinforcement learning benchmarks using different reinforcement learning algorithms. From a study with 90 human participants, we show that our distal explanation model results in improved outcomes over three scenarios compared with two baseline explanation models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes