LGAIOct 12, 2022

A Unified Framework for Alternating Offline Model Training and Policy Learning

Apple
arXiv:2210.05922v118 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in offline MBRL for robotics and control applications, offering an incremental improvement over existing methods.

The paper tackles the objective mismatch between dynamic models and policies in offline model-based reinforcement learning by proposing an iterative framework that alternates between model training and policy learning, achieving competitive performance on continuous-control datasets.

In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model from historically collected data, and subsequently utilize the learned model and fixed datasets for policy learning, without further interacting with the environment. Offline MBRL algorithms can improve the efficiency and stability of policy learning over the model-free algorithms. However, in most of the existing offline MBRL algorithms, the learning objectives for the dynamic models and the policies are isolated from each other. Such an objective mismatch may lead to inferior performance of the learned agents. In this paper, we address this issue by developing an iterative offline MBRL framework, where we maximize a lower bound of the true expected return, by alternating between dynamic-model training and policy learning. With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets. Source code is publicly released.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes