LGJan 25, 2022

MOORe: Model-based Offline-to-Online Reinforcement Learning

arXiv:2201.10070v114 citations
AI Analysis

This addresses the challenge of safe and fast adaptation for real-world RL deployment, though it appears incremental as it builds on existing offline-to-online methods.

The paper tackles the problem of smoothly and efficiently transferring offline-trained reinforcement learning policies to online deployment, proposing MOORe, which uses a prioritized sampling scheme to dynamically adjust data usage, and shows significant outperformance over existing methods on the D4RL benchmark.

With the success of offline reinforcement learning (RL), offline trained RL policies have the potential to be further improved when deployed online. A smooth transfer of the policy matters in safe real-world deployment. Besides, fast adaptation of the policy plays a vital role in practical online performance improvement. To tackle these challenges, we propose a simple yet efficient algorithm, Model-based Offline-to-Online Reinforcement learning (MOORe), which employs a prioritized sampling scheme that can dynamically adjust the offline and online data for smooth and efficient online adaptation of the policy. We provide a theoretical foundation for our algorithms design. Experiment results on the D4RL benchmark show that our algorithm smoothly transfers from offline to online stages while enabling sample-efficient online adaption, and also significantly outperforms existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes