LGAIROFeb 24, 2021

Memory-based Deep Reinforcement Learning for POMDPs

arXiv:2102.12344v5132 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of partial observability in real-world robotics, though it is an incremental improvement over existing methods.

The paper tackles the problem of applying deep reinforcement learning to partially observable environments by proposing LSTM-TD3, which integrates a memory component into TD3, and shows it outperforms other methods in handling missing and noisy data.

A promising characteristic of Deep Reinforcement Learning (DRL) is its capability to learn optimal policy in an end-to-end manner without relying on feature engineering. However, most approaches assume a fully observable state space, i.e. fully observable Markov Decision Processes (MDPs). In real-world robotics, this assumption is unpractical, because of issues such as sensor sensitivity limitations and sensor noise, and the lack of knowledge about whether the observation design is complete or not. These scenarios lead to Partially Observable MDPs (POMDPs). In this paper, we propose Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient (LSTM-TD3) by introducing a memory component to TD3, and compare its performance with other DRL algorithms in both MDPs and POMDPs. Our results demonstrate the significant advantages of the memory component in addressing POMDPs, including the ability to handle missing and noisy observation data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes