AI LGMay 23, 2024

Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning

Hector Kohler, Quentin Delfosse, Riad Akrour, Kristian Kersting, Philippe Preux

arXiv:2405.14956v117.221 citationsh-index: 15Has Code

Originality Incremental advance

AI Analysis

This addresses the need for trustworthy and interpretable policies in reinforcement learning, particularly for real-world deployment, though it is incremental as it builds on existing distillation and interpretability techniques.

The paper tackles the problem of goal misalignments in deep reinforcement learning by proposing INTERPRETER, a method that produces interpretable and editable tree policies, which match oracles across diverse tasks and enable correction of misalignments in Atari games and real farming strategies.

Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tRee Programs for ReinforcEmenT lEaRning. We empirically demonstrate that INTERPRETER compact tree programs match oracles across a diverse set of sequential decision tasks and evaluate the impact of our design choices on interpretability and performances. We show that our policies can be interpreted and edited to correct misalignments on Atari games and to explain real farming strategies.

View on arXiv PDF Code

Similar