AI LG MLAug 6, 2017

An Information-Theoretic Optimality Principle for Deep Reinforcement Learning

Felix Leibfried, Jordi Grau-Moya, Haitham Bou-Ammar

arXiv:1708.01867v514.224 citations

Originality Incremental advance

AI Analysis

This addresses a key bottleneck in deep reinforcement learning for applications like game-playing, though it appears incremental as it builds on existing Q-network frameworks.

The paper tackled Q-value overestimation in deep reinforcement learning for high-dimensional state spaces by introducing an information-theoretic penalty signal, resulting in an algorithm that outperformed deep and double deep Q-networks on Atari games in both performance and sample complexity.

We methodologically address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly. We furthermore propose a novel scheduling scheme for this Lagrange multiplier to ensure efficient and robust learning. In experiments on Atari, our algorithm outperforms other algorithms (e.g. deep and double deep Q-networks) in terms of both game-play performance and sample complexity. These results remain valid under the recently proposed dueling architecture.

View on arXiv PDF

Similar