LGAINov 28, 2023

Goal-conditioned Offline Planning from Curious Exploration

arXiv:2311.16996v12 citationsh-index: 33
Originality Incremental advance
AI Analysis

This addresses the challenge of offline goal-conditioned planning for reinforcement learning agents, offering a method to correct estimation artifacts in learned value functions, though it is incremental as it builds on existing curiosity and planning techniques.

The paper tackled the problem of extracting goal-conditioned behavior from unsupervised curiosity-driven exploration without additional environment interaction, and found that combining model-based planning with graph-based value aggregation significantly improved zero-shot goal-reaching performance across diverse simulated environments.

Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes