AILGMLAug 6, 2017

An Information-Theoretic Optimality Principle for Deep Reinforcement Learning

arXiv:1708.01867v524 citations
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in deep reinforcement learning for applications like game-playing, though it appears incremental as it builds on existing Q-network frameworks.

The paper tackled Q-value overestimation in deep reinforcement learning for high-dimensional state spaces by introducing an information-theoretic penalty signal, resulting in an algorithm that outperformed deep and double deep Q-networks on Atari games in both performance and sample complexity.

We methodologically address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly. We furthermore propose a novel scheduling scheme for this Lagrange multiplier to ensure efficient and robust learning. In experiments on Atari, our algorithm outperforms other algorithms (e.g. deep and double deep Q-networks) in terms of both game-play performance and sample complexity. These results remain valid under the recently proposed dueling architecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes