LG MLJun 4, 2020

Meta-Model-Based Meta-Policy Optimization

Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka

arXiv:2006.02608v59.69 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the theoretical understanding and performance assurance of model-based meta-RL for multi-task settings, which is incremental as it builds on prior theorems.

The paper tackles the lack of theoretical guarantees for model-based meta-reinforcement learning methods by extending existing theorems to provide performance guarantees, and proposes M3PO, which outperforms existing meta-RL methods in continuous-control benchmarks.

Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

View on arXiv PDF Code

Similar