LGAIITROMLJul 24, 2020

Predictive Information Accelerates Learning in RL

arXiv:2007.12401v282 citations
AI Analysis

This work addresses sample efficiency for RL practitioners, but it is incremental as it builds on existing methods like SAC and CEB.

The paper tackled the problem of improving sample efficiency in reinforcement learning by incorporating predictive information, showing that PI-SAC agents achieved substantial gains over baselines on DM Control tasks.

The Predictive Information is the mutual information between the past and the future, I(X_past; X_future). We hypothesize that capturing the predictive information is useful in RL, since the ability to model what will happen next is necessary for success on many tasks. To test our hypothesis, we train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed representation of the predictive information of the RL environment dynamics using a contrastive version of the Conditional Entropy Bottleneck (CEB) objective. We refer to these as Predictive Information SAC (PI-SAC) agents. We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels. Our implementation is given on GitHub.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes