LGAISYJan 16, 2025

From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation

arXiv:2501.09858v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the challenge of trusting decision-making in high-stakes RL applications, though it is incremental as it builds on existing explainable RL methods.

The paper tackles the problem of understanding deep reinforcement learning policies by proposing a model-agnostic approach using Shapley values to transform them into transparent representations, demonstrating that it preserves performance and generates more stable interpretable policies in classic control environments.

Deep reinforcement learning (RL) has shown remarkable success in complex domains, however, the inherent black box nature of deep neural network policies raises significant challenges in understanding and trusting the decision-making processes. While existing explainable RL methods provide local insights, they fail to deliver a global understanding of the model, particularly in high-stakes applications. To overcome this limitation, we propose a novel model-agnostic approach that bridges the gap between explainability and interpretability by leveraging Shapley values to transform complex deep RL policies into transparent representations. The proposed approach offers two key contributions: a novel approach employing Shapley values to policy interpretation beyond local explanations and a general framework applicable to off-policy and on-policy algorithms. We evaluate our approach with three existing deep RL algorithms and validate its performance in two classic control environments. The results demonstrate that our approach not only preserves the original models' performance but also generates more stable interpretable policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes