AIOct 11, 2020

Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions

arXiv:2010.05180v229 citations
Originality Incremental advance
AI Analysis

This addresses the need for interpretable AI in reinforcement learning, particularly for users requiring explanations of agent decisions, though it is incremental as it builds on existing explanation methods.

The paper tackles the problem of explaining action preferences in deep reinforcement learning by learning action-values through human-understandable features, enabling contrastive explanations via predicted future properties. The results show that ESP models can be effectively learned and provide insightful explanations in three domains, including a complex strategy game.

We investigate a deep reinforcement learning (RL) architecture that supports explaining why a learned agent prefers one action over another. The key idea is to learn action-values that are directly represented via human-understandable properties of expected futures. This is realized via the embedded self-prediction (ESP)model, which learns said properties in terms of human provided features. Action preferences can then be explained by contrasting the future properties predicted for each action. To address cases where there are a large number of features, we develop a novel method for computing minimal sufficient explanations from anESP. Our case studies in three domains, including a complex strategy game, show that ESP models can be effectively learned and support insightful explanations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes