LGAIDec 9, 2024

Skill-Enhanced Reinforcement Learning Acceleration from Heterogeneous Demonstrations

arXiv:2412.06207v2h-index: 2ECAI
AI Analysis

This addresses the problem of slow reinforcement learning training due to scarce expert data for researchers and practitioners in robotics and AI.

The paper tackles the limited availability of expert demonstrations in Learning from Demonstration for reinforcement learning by proposing SeRLA, a two-stage method that extracts skill priors from heterogeneous demonstrations and uses them to accelerate training, achieving state-of-the-art performance in early training phases on standard benchmarks.

Learning from Demonstration (LfD) is a well-established problem in Reinforcement Learning (RL), which aims to facilitate rapid RL by leveraging expert demonstrations to pre-train the RL agent. However, the limited availability of expert demonstration data often hinders its ability to effectively aid downstream RL learning. To address this problem, we propose a novel two-stage method dubbed as Skill-enhanced Reinforcement Learning Acceleration (SeRLA). SeRLA introduces a skill-level adversarial Positive-Unlabeled (PU) learning model that extracts useful skill prior knowledge by learning from both expert demonstrations and general low-cost demonstrations in the offline prior learning stage. Building on this, it employs a skill-based soft actor-critic algorithm to leverage the acquired priors for efficient training of a skill policy network in the downstream online RL stage. In addition, we propose a simple skill-level data enhancement technique to mitigate data sparsity and further improve both skill prior learning and skill policy training. Experiments across multiple standard RL benchmarks demonstrate that SeRLA achieves state-of-the-art performance in accelerating reinforcement learning on downstream tasks, particularly in the early training phase.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes