LG MLJan 15, 2020

SEERL: Sample Efficient Ensemble Reinforcement Learning

Rohan Saphal, Balaraman Ravindran, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul

arXiv:2001.05209v210.621 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of making ensemble methods more practical for reinforcement learning practitioners by reducing sample and computational costs, though it is incremental as it builds on existing ensemble and RL techniques.

The paper tackles the high sample complexity and computational expense of ensemble methods in reinforcement learning by introducing a novel training and model selection framework that learns diverse policies from a single training run through directed parameter perturbations. The framework is shown to be substantially sample efficient and computationally inexpensive, outperforming state-of-the-art scores on Atari 2600 and Mujoco benchmarks.

Ensemble learning is a very prevalent method employed in machine learning. The relative success of ensemble methods is attributed to their ability to tackle a wide range of instances and complex problems that require different low-level approaches. However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved in obtaining a diverse ensemble. We present a novel training and model selection framework for model-free reinforcement algorithms that use ensembles of policies obtained from a single training run. These policies are diverse in nature and are learned through directed perturbation of the model parameters at regular intervals. We show that learning and selecting an adequately diverse set of policies is required for a good ensemble while extreme diversity can prove detrimental to overall performance. Selection of an adequately diverse set of policies is done through our novel policy selection framework. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state-of-the-art (SOTA) scores in Atari 2600 and Mujoco.

View on arXiv PDF

Similar