LGMLApr 17, 2018

On Improving Deep Reinforcement Learning for POMDPs

arXiv:1804.06309v2139 citations
Originality Incremental advance
AI Analysis

This addresses a gap in deep RL for partially observable environments, offering a domain-specific solution that is incremental in nature.

The paper tackles the problem of applying deep reinforcement learning to partially observable Markov decision processes (POMDPs), proposing the Action-specific Deep Recurrent Q-Network (ADRQN) architecture, which improves learning performance in domains like flickering Atari games.

Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes