LGAIDec 6, 2021

ED2: Environment Dynamics Decomposition World Models for Continuous Control

arXiv:2112.02817v21 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the sample efficiency limitation in MBRL for continuous control, though it is incremental as it builds on existing MBRL algorithms.

The paper tackles the problem of model prediction error in model-based reinforcement learning (MBRL) by proposing ED2, a framework that decomposes environment dynamics into sub-dynamics, which reduces model error and improves sample efficiency and asymptotic performance on continuous control tasks.

Model-based reinforcement learning (MBRL) achieves significant sample efficiency in practice in comparison to model-free RL, but its performance is often limited by the existence of model prediction error. To reduce the model error, standard MBRL approaches train a single well-designed network to fit the entire environment dynamics, but this wastes rich information on multiple sub-dynamics which can be modeled separately, allowing us to construct the world model more accurately. In this paper, we propose the Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner. ED2 contains two key components: sub-dynamics discovery (SD2) and dynamics decomposition prediction (D2P). SD2 discovers the sub-dynamics in an environment automatically and then D2P constructs the decomposed world model following the sub-dynamics. ED2 can be easily combined with existing MBRL algorithms and empirical results show that ED2 significantly reduces the model error, increases the sample efficiency, and achieves higher asymptotic performance when combined with the state-of-the-art MBRL algorithms on various continuous control tasks. Our code is open source and available at https://github.com/ED2-source-code/ED2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes