LG MLSep 15, 2018

Towards Better Interpretability in Deep Q-Networks

arXiv:1809.05630v212.569 citations

Originality Incremental advance

AI Analysis

This addresses the need for better interpretability in deep reinforcement learning for researchers and practitioners, though it is incremental as it builds on existing Q-learning methods.

The paper tackled the problem of interpretability in deep Q-networks by proposing an interpretable neural network architecture that provides global explanations using key-value memories, attention, and reconstructible embeddings, achieving training rewards comparable to state-of-the-art models but revealing shallow features and overfitting issues in out-of-sample testing.

Deep reinforcement learning techniques have demonstrated superior performance in a wide variety of environments. As improvements in training algorithms continue at a brisk pace, theoretical or empirical studies on understanding what these networks seem to learn, are far behind. In this paper we propose an interpretable neural network architecture for Q-learning which provides a global explanation of the model's behavior using key-value memories, attention and reconstructible embeddings. With a directed exploration strategy, our model can reach training rewards comparable to the state-of-the-art deep Q-learning models. However, results suggest that the features extracted by the neural network are extremely shallow and subsequent testing using out-of-sample examples shows that the agent can easily overfit to trajectories seen during training.

View on arXiv PDF

Similar