LGAIFeb 17, 2023

Swapped goal-conditioned offline reinforcement learning

arXiv:2302.08865v12 citationsh-index: 20
Originality Incremental advance
AI Analysis

This addresses generalization challenges in offline goal-conditioned RL for robotics and control applications, representing a strong incremental improvement.

The paper tackles the problem of overfitting in offline goal-conditioned reinforcement learning by proposing a goal-swapping procedure to generate additional trajectories and a deterministic Q-advantage policy gradient method to reduce noise and extrapolation errors. The method outperforms state-of-the-art approaches on benchmark tasks and achieves good performance on challenging dexterous in-hand manipulation tasks where prior methods failed.

Offline goal-conditioned reinforcement learning (GCRL) can be challenging due to overfitting to the given dataset. To generalize agents' skills outside the given dataset, we propose a goal-swapping procedure that generates additional trajectories. To alleviate the problem of noise and extrapolation errors, we present a general offline reinforcement learning method called deterministic Q-advantage policy gradient (DQAPG). In the experiments, DQAPG outperforms state-of-the-art goal-conditioned offline RL methods in a wide range of benchmark tasks, and goal-swapping further improves the test results. It is noteworthy, that the proposed method obtains good performance on the challenging dexterous in-hand manipulation tasks for which the prior methods failed.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes