LGAIOct 30, 2023

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

arXiv:2310.20025v318 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses limitations in offline GCRL for robotics and AI applications, offering a novel method to enhance data efficiency and generalization, though it is incremental as it builds on existing model-based and planning approaches.

The paper tackles the problem of offline goal-conditioned reinforcement learning (GCRL) by proposing GOPlan, a model-based framework that uses planning with learned models to improve policy optimization, achieving state-of-the-art performance on navigation and manipulation tasks and demonstrating superior handling of small data budgets and generalization to out-of-distribution goals.

Offline Goal-Conditioned RL (GCRL) offers a feasible paradigm for learning general-purpose policies from diverse and multi-task offline datasets. Despite notable recent progress, the predominant offline GCRL methods, mainly model-free, face constraints in handling limited data and generalizing to unseen goals. In this work, we propose Goal-conditioned Offline Planning (GOPlan), a novel model-based framework that contains two key phases: (1) pretraining a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset; (2) employing the reanalysis method with planning to generate imagined trajectories for funetuning policies. Specifically, we base the prior policy on an advantage-weighted conditioned generative adversarial network, which facilitates distinct mode separation, mitigating the pitfalls of out-of-distribution (OOD) actions. For further policy optimization, the reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals. With thorough experimental evaluations, we demonstrate that GOPlan achieves state-of-the-art performance on various offline multi-goal navigation and manipulation tasks. Moreover, our results highlight the superior ability of GOPlan to handle small data budgets and generalize to OOD goals.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes