LGFeb 16, 2024

Learning Goal-Conditioned Policies from Sub-Optimal Offline Data via Metric Learning

arXiv:2402.10820v25 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the problem of improving goal-conditioned reinforcement learning from imperfect data for researchers and practitioners in offline RL.

The paper tackles learning optimal goal-conditioned policies from sub-optimal offline datasets by using metric learning to approximate the optimal value function, and shows that their method consistently outperforms prior state-of-the-art methods in this setting.

We address the problem of learning optimal behavior from sub-optimal datasets for goal-conditioned offline reinforcement learning. To do so, we propose the use of metric learning to approximate the optimal value function for goal-conditioned offline RL problems under sparse rewards, invertible actions and deterministic transitions. We introduce distance monotonicity, a property for representations to recover optimality and propose an optimization objective that leads to such property. We use the proposed value function to guide the learning of a policy in an actor-critic fashion, a method we name MetricRL. Experimentally, we show that our method estimates optimal behaviors from severely sub-optimal offline datasets without suffering from out-of-distribution estimation errors. We demonstrate that MetricRL consistently outperforms prior state-of-the-art goal-conditioned RL methods in learning optimal policies from sub-optimal offline datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes