LGAIROMay 25, 2023

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

arXiv:2305.15669v131 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of combining offline pretraining and online finetuning in RL for improved sample efficiency and policy performance, though it appears incremental as it builds on existing methods with a regularization approach.

The paper tackles the problem of suboptimal performance, limited adaptability, and computational inefficiency in offline-to-online reinforcement learning by proposing PROTO, a framework that uses iterative policy regularization, achieving superior performance over state-of-the-art baselines in experiments.

Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance. However, existing methods, effective as they are, suffer from suboptimal performance, limited adaptability, and unsatisfactory computational efficiency. We propose a novel framework, PROTO, which overcomes the aforementioned limitations by augmenting the standard RL objective with an iteratively evolving regularization term. Performing a trust-region-style update, PROTO yields stable initial finetuning and optimal final performance by gradually evolving the regularization term to relax the constraint strength. By adjusting only a few lines of code, PROTO can bridge any offline policy pretraining and standard off-policy RL finetuning to form a powerful offline-to-online RL pathway, birthing great adaptability to diverse methods. Simple yet elegant, PROTO imposes minimal additional computation and enables highly efficient online finetuning. Extensive experiments demonstrate that PROTO achieves superior performance over SOTA baselines, offering an adaptable and efficient offline-to-online RL framework.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes