AIJan 13, 2020

Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning

arXiv:2001.04418v14 citations
AI Analysis

This work addresses the need for explainable agents in automated decision processes, though it appears incremental with limited performance gains.

The authors tackled the problem of making reinforcement learning agents compositional and interpretable by using a diagnostic classifier to interpret latent spaces in response to language instructions, resulting in improved interpretability but a shift in zero-shot performance to novel instructions.

In this work, we present an alternative approach to making an agent compositional through the use of a diagnostic classifier. Because of the need for explainable agents in automated decision processes, we attempt to interpret the latent space from an RL agent to identify its current objective in a complex language instruction. Results show that the classification process causes changes in the hidden states which makes them more easily interpretable, but also causes a shift in zero-shot performance to novel instructions. Lastly, we limit the supervisory signal on the classification, and observe a similar but less notable effect.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes