LG AI ROFeb 19, 2021

Model-Invariant State Abstractions for Model-Based Reinforcement Learning

Manan Tomar, Amy Zhang, Roberto Calandra, Matthew E. Taylor, Joelle Pineau

arXiv:2102.09850v214.633 citations

Originality Incremental advance

AI Analysis

This addresses the problem of sample inefficiency in model-based reinforcement learning for complex tasks, offering a novel approach with practical gains, though it appears incremental as it builds on existing state abstraction concepts.

The paper tackles the sample inefficiency of learning accurate dynamics models in model-based reinforcement learning by exploiting sparsity in dynamics through a new state abstraction called model-invariance, which leverages causal sparsity over state variables to enable compositional generalization and shows improved modeling performance and sample efficiency on tasks like MuJoCo Humanoid.

Accuracy and generalization of dynamics models is key to the success of model-based reinforcement learning (MBRL). As the complexity of tasks increases, so does the sample inefficiency of learning accurate dynamics models. However, many complex tasks also exhibit sparsity in the dynamics, i.e., actions have only a local effect on the system dynamics. In this paper, we exploit this property with a causal invariance perspective in the single-task setting, introducing a new type of state abstraction called \textit{model-invariance}. Unlike previous forms of state abstractions, a model-invariance state abstraction leverages causal sparsity over state variables. This allows for compositional generalization to unseen states, something that non-factored forms of state abstractions cannot do. We prove that an optimal policy can be learned over this model-invariance state abstraction and show improved generalization in a simple toy domain. Next, we propose a practical method to approximately learn a model-invariant representation for complex domains and validate our approach by showing improved modelling performance over standard maximum likelihood approaches on challenging tasks, such as the MuJoCo-based Humanoid. Finally, within the MBRL setting we show strong performance gains with respect to sample efficiency across a host of other continuous control tasks.

View on arXiv PDF

Similar