LG AIDec 10, 2019

Efficient and Robust Reinforcement Learning with Uncertainty-based Value Expansion

Bo Zhou, Hongsheng Zeng, Fan Wang, Yunxiang Li, Hao Tian

arXiv:1912.05328v19.518 citations

Originality Highly original

AI Analysis

This work addresses the robustness issue in model-based RL for stochastic environments, which is an incremental improvement over existing model-based value expansion methods.

The paper tackles the problem of model-based reinforcement learning methods suffering from high function approximation errors in stochastic environments, proposing a novel hybrid method called Risk Averse Value Expansion (RAVE) that uses an ensemble of probabilistic dynamics models and lower confidence bounds to enhance robustness, achieving state-of-the-art performance and winning first place in the NeurIPS 2019 Learn to Move competition.

By integrating dynamics models into model-free reinforcement learning (RL) methods, model-based value expansion (MVE) algorithms have shown a significant advantage in sample efficiency as well as value estimation. However, these methods suffer from higher function approximation errors than model-free methods in stochastic environments due to a lack of modeling the environmental randomness. As a result, their performance lags behind the best model-free algorithms in some challenging scenarios. In this paper, we propose a novel Hybrid-RL method that builds on MVE, namely the Risk Averse Value Expansion (RAVE). With imaginative rollouts generated by an ensemble of probabilistic dynamics models, we further introduce the aversion of risks by seeking the lower confidence bound of the estimation. Experiments on a range of challenging environments show that by modeling the uncertainty completely, RAVE substantially enhances the robustness of previous model-based methods, and yields state-of-the-art performance. With this technique, our solution gets the first place in NeurIPS 2019: Learn to Move.

View on arXiv PDF

Similar