Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
This work addresses robustness and generalization in reinforcement learning for chaotic systems, but it appears incremental as it applies existing complexity measures to analyze known regularization effects.
The study investigated the regularization properties of Maximum-Entropy Reinforcement Learning, finding a relationship between entropy-regularized policy optimization and robustness to noise in chaotic dynamical systems with Gaussian noise, as explained by complexity measures from statistical learning theory.
The generalisation and robustness properties of policies learnt through Maximum-Entropy Reinforcement Learning are investigated on chaotic dynamical systems with Gaussian noise on the observable. First, the robustness under noise contamination of the agent's observation of entropy regularised policies is observed. Second, notions of statistical learning theory, such as complexity measures on the learnt model, are borrowed to explain and predict the phenomenon. Results show the existence of a relationship between entropy-regularised policy optimisation and robustness to noise, which can be described by the chosen complexity measures.