LGAIOct 31, 2017

Regret Minimization for Partially Observable Deep Reinforcement Learning

arXiv:1710.11424v254 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of partial observability in reinforcement learning for domains like gaming and robotics, though it appears incremental as it builds on existing regret minimization methods.

The authors tackled the problem of deep reinforcement learning in partially observable environments by proposing a new algorithm based on counterfactual regret minimization, which substantially outperformed strong baselines on tasks like first-person 3D navigation in Doom and Minecraft.

Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes