LGDec 14, 2015

Memory-based control with recurrent neural networks

arXiv:1512.04455v1329 citations
Originality Incremental advance
AI Analysis

This work addresses memory challenges in reinforcement learning for robotics and AI control systems, representing an incremental improvement by adapting existing methods to partially observed domains.

The paper tackled partially observed control problems in reinforcement learning by extending deterministic policy gradient and stochastic value gradient algorithms with recurrent neural networks, demonstrating that this approach can solve a variety of physical control tasks with memory requirements, including a simplified Morris water maze and high-dimensional pixel observations.

Partially observed control problems are a challenging aspect of reinforcement learning. We extend two related, model-free algorithms for continuous control -- deterministic policy gradient and stochastic value gradient -- to solve partially observed domains using recurrent neural networks trained with backpropagation through time. We demonstrate that this approach, coupled with long-short term memory is able to solve a variety of physical control problems exhibiting an assortment of memory requirements. These include the short-term integration of information from noisy sensors and the identification of system parameters, as well as long-term memory problems that require preserving information over many time steps. We also demonstrate success on a combined exploration and memory problem in the form of a simplified version of the well-known Morris water maze task. Finally, we show that our approach can deal with high-dimensional observations by learning directly from pixels. We find that recurrent deterministic and stochastic policies are able to learn similarly good solutions to these tasks, including the water maze where the agent must learn effective search strategies.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes