Maximum Total Correlation Reinforcement Learning
This work addresses the need for more generalizable and robust reinforcement learning policies, though it appears incremental as it builds on existing regularization and simplicity techniques.
The paper tackled the problem of promoting simple behavior in reinforcement learning by maximizing total correlation within trajectories, resulting in policies that are more robust to noise and changes in dynamics while improving task performance in simulated robot environments.
Simplicity is a powerful inductive bias. In reinforcement learning, regularization is used for simpler policies, data augmentation for simpler representations, and sparse reward functions for simpler objectives, all that, with the underlying motivation to increase generalizability and robustness by focusing on the essentials. Supplementary to these techniques, we investigate how to promote simple behavior throughout the episode. To that end, we introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories. We propose a practical algorithm that optimizes all models, including policy and state representation, based on a lower-bound approximation. In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories, and that exhibit superior robustness to noise and changes in dynamics compared to baseline methods, while also improving performance in the original tasks.