Regularizing Model-Based Planning with Energy-Based Models
This addresses sample efficiency in reinforcement learning for robotics and control applications, though it appears incremental as it builds on existing model-based planning methods.
The paper tackles the challenge of planning with inaccurate learned dynamics models in model-based reinforcement learning by regularizing planning using energy estimates of state transitions, achieving competitive performance in continuous control tasks like Half-cheetah and Ant with just a few minutes of experience.
Model-based reinforcement learning could enable sample-efficient learning by quickly acquiring rich knowledge about the world and using it to improve behaviour without additional data. Learned dynamics models can be directly used for planning actions but this has been challenging because of inaccuracies in the learned models. In this paper, we focus on planning with learned dynamics models and propose to regularize it using energy estimates of state transitions in the environment. We visually demonstrate the effectiveness of the proposed method and show that off-policy training of an energy estimator can be effectively used to regularize planning with pre-trained dynamics models. Further, we demonstrate that the proposed method enables sample-efficient learning to achieve competitive performance in challenging continuous control tasks such as Half-cheetah and Ant in just a few minutes of experience.