LGMay 4, 2023

Explainable Reinforcement Learning via a Causal World Model

arXiv:2305.02749v530 citations
Originality Incremental advance
AI Analysis

This addresses the problem of interpretability in RL for practitioners needing transparent decision-making, though it appears incremental as it builds on existing causal and explanatory methods.

The paper tackles the challenge of generating explanations for reinforcement learning by developing a causal world model that captures long-term action effects, resulting in improved explainability while maintaining accuracy for model-based learning.

Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes