EXPIL: Explanatory Predicate Invention for Learning in Games
This addresses the issue of black-box reasoning in RL for game-playing agents, offering a more scalable approach to interpretability, though it is incremental in reducing background knowledge requirements.
The paper tackles the problem of interpretability in reinforcement learning agents by proposing EXPIL, which extracts predicates from pretrained neural agents to reduce dependency on predefined background knowledge, achieving explainable behavior in logic agents.
Reinforcement learning (RL) has proven to be a powerful tool for training agents that excel in various games. However, the black-box nature of neural network models often hinders our ability to understand the reasoning behind the agent's actions. Recent research has attempted to address this issue by using the guidance of pretrained neural agents to encode logic-based policies, allowing for interpretable decisions. A drawback of such approaches is the requirement of large amounts of predefined background knowledge in the form of predicates, limiting its applicability and scalability. In this work, we propose a novel approach, Explanatory Predicate Invention for Learning in Games (EXPIL), that identifies and extracts predicates from a pretrained neural agent, later used in the logic-based agents, reducing the dependency on predefined background knowledge. Our experimental evaluation on various games demonstrate the effectiveness of EXPIL in achieving explainable behavior in logic agents while requiring less background knowledge.