Backward explanations via redefinition of predicates
This addresses the challenge of providing interpretable explanations for RL agents in complex, long-term interactions, though it appears incremental as it builds on the existing HXP framework.
The paper tackles the problem of explaining long sequences of actions in reinforcement learning by proposing Backward-HXP, a method that avoids approximating action importance scores, which are computationally hard to compute, and demonstrates its ability to summarize long histories effectively.
History eXplanation based on Predicates (HXP), studies the behavior of a Reinforcement Learning (RL) agent in a sequence of agent's interactions with the environment (a history), through the prism of an arbitrary predicate. To this end, an action importance score is computed for each action in the history. The explanation consists in displaying the most important actions to the user. As the calculation of an action's importance is #W[1]-hard, it is necessary for long histories to approximate the scores, at the expense of their quality. We therefore propose a new HXP method, called Backward-HXP, to provide explanations for these histories without having to approximate scores. Experiments show the ability of B-HXP to summarise long histories.