AILGMar 8, 2023

RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

arXiv:2303.04475v27 citationsh-index: 23
AI Analysis

This addresses the need for interpretable and actionable explanations in RL systems, which is crucial for trust and deployment in real-world applications, though it appears incremental as it builds on existing counterfactual explanation methods by adapting them to RL.

The authors tackled the problem of generating counterfactual explanations for reinforcement learning agents, which are often black-box and hard to trust, by proposing RACCER, an RL-specific method that ensures reachable and certain counterfactuals, and showed it helps users better understand agent behavior compared to state-of-the-art approaches.

While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals that are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behavior of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly probable desired outcomes. We use a heuristic tree search of the agent's execution trajectories to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior compared to the current state-of-the-art approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes