LGJul 11, 2025

Online Pre-Training for Offline-to-Online Reinforcement Learning

arXiv:2507.08387v14 citations
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in offline-to-online RL for researchers and practitioners, offering a significant but incremental improvement over existing methods.

The paper tackles the problem of offline pre-trained agents underperforming during online fine-tuning in reinforcement learning due to inaccurate value estimation, and proposes a novel method called Online Pre-Training (OPT) that achieves an average 30% performance improvement across various D4RL environments.

Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes