AILGNov 16, 2021

Causal policy ranking

arXiv:2111.08415v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of understanding which decisions in RL policies contribute to rewards, which is incremental as it builds on existing causal methods for policy interpretation.

The paper tackles the problem of interpreting complex reinforcement learning policies by proposing a black-box method that uses counterfactual reasoning to estimate and rank the causal effects of decisions on reward attainment, comparing it to a non-causal alternative to highlight benefits.

Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with $n$ time steps, a policy will make $n$ decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant is their contribution. Given a trained policy, we propose a black-box method based on counterfactual reasoning that estimates the causal effect that these decisions have on reward attainment and ranks the decisions according to this estimate. In this preliminary work, we compare our measure against an alternative, non-causal, ranking procedure, highlight the benefits of causality-based policy ranking, and discuss potential future work integrating causal algorithms into the interpretation of RL agent policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes