LGAINEMar 11, 2019

Deep Recurrent Q-Learning vs Deep Q-Learning on a simple Partially Observable Markov Decision Process with Minecraft

arXiv:1903.04311v25 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the choice of reinforcement learning architectures for partially observable environments, but it is incremental as it tests existing methods on new data without introducing novel techniques.

The study compared Deep Q-Learning (DQN) and Deep Recurrent Q-Learning (DRQN) on simple partially observable tasks in Minecraft, finding that DRQN, while more complex and slower to train, did not always outperform DQN in these scenarios.

Deep Q-Learning has been successfully applied to a wide variety of tasks in the past several years. However, the architecture of the vanilla Deep Q-Network is not suited to deal with partially observable environments such as 3D video games. For this, recurrent layers have been added to the Deep Q-Network in order to allow it to handle past dependencies. We here use Minecraft for its customization advantages and design two very simple missions that can be frames as Partially Observable Markov Decision Process. We compare on these missions the Deep Q-Network and the Deep Recurrent Q-Network in order to see if the latter, which is trickier and longer to train, is always the best architecture when the agent has to deal with partial observability.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes