AISYMay 16, 2021

Model-Based Offline Planning with Trajectory Pruning

arXiv:2105.07351v343 citations
Originality Incremental advance
AI Analysis

This work addresses practical computational and flexibility issues in offline RL for system control, representing an incremental improvement in model-based planning.

The paper tackles the challenge of offline reinforcement learning in real-world control tasks by proposing MOPP, a lightweight model-based planning framework that encourages aggressive trajectory rollout and prunes problematic ones, achieving competitive performance compared to existing methods.

The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from pre-collected datasets without environment interaction. Unfortunately, existing offline RL methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. The model-based planning framework provides an attractive alternative. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes