Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models
This work addresses bottlenecks in real-world reinforcement learning applications by improving data efficiency and generalization across tasks, representing a novel method for a known bottleneck.
The paper tackles low data efficiency and weak generalization in reinforcement learning by proposing a Thompson-sampling based approach with a graph structured surrogate model (GSSM) for predicting environment dynamics, which outperforms state-of-the-art methods and achieves high returns while enabling fast execution by avoiding test-time policy optimization.
Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications. Model-based meta reinforcement learning addresses these issues by learning dynamics and leveraging knowledge from prior experience. In this paper, we take a closer look at this framework, and propose a new Thompson-sampling based approach that consists of a new model to identify task dynamics together with an amortized policy optimization step. We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics. Additionally, our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.