Policy Optimization with Sparse Global Contrastive Explanations
This addresses the need for interpretable policy modifications in RL, but appears incremental as it builds on existing RL and explanation methods.
The paper tackles the problem of improving an existing behavior policy in Reinforcement Learning by making minimal, user-interpretable changes, resulting in a framework that enforces sparse global contrastive explanations between policies.
We develop a Reinforcement Learning (RL) framework for improving an existing behavior policy via sparse, user-interpretable changes. Our goal is to make minimal changes while gaining as much benefit as possible. We define a minimal change as having a sparse, global contrastive explanation between the original and proposed policy. We improve the current policy with the constraint of keeping that global contrastive explanation short. We demonstrate our framework with a discrete MDP and a continuous 2D navigation domain.