LGAIROSYMLAug 12, 2020

Model-Based Offline Planning

arXiv:2008.05556v3170 citations
Originality Incremental advance
AI Analysis

This addresses the need for more controllable and integrable policies in offline RL for real systems like robotics, though it appears incremental as it builds on existing model-based approaches.

The paper tackles the problem of offline reinforcement learning by proposing Model-Based Offline Planning (MBOP), which generates a model from data for direct control through planning, achieving near-optimal policies from as little as 50 seconds of real-time interaction and enabling zero-shot goal-conditioned policies.

Offline learning is a key part of making reinforcement learning (RL) useable in real systems. Offline RL looks at scenarios where there is data from a system's operation, but no direct access to the system when learning a policy. Recent work on training RL policies from offline data has shown results both with model-free policies learned directly from the data, or with planning on top of learnt models of the data. Model-free policies tend to be more performant, but are more opaque, harder to command externally, and less easy to integrate into larger systems. We propose an offline learner that generates a model that can be used to control the system directly through planning. This allows us to have easily controllable policies directly from data, without ever interacting with the system. We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of robotics-inspired tasks, and demonstrate its ability leverage planning to respect environmental constraints. We are able to find near-optimal polices for certain simulated systems from as little as 50 seconds of real-time system interaction, and create zero-shot goal-conditioned policies on a series of environments. An accompanying video can be found here: https://youtu.be/nxGGHdZOFts

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes