LGJan 13, 2025

Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

Shilong Deng, Zetao Zheng, Hongcai He, Paul Weng, Jie Shao

arXiv:2501.07346v1h-index: 2AAAI

Originality Highly original

AI Analysis

This work addresses the problem of inefficient online reinforcement learning for researchers and practitioners by providing a flexible, domain-agnostic enhancement that leverages offline data more effectively.

The paper tackled the challenge of learning optimal policies from sparse rewards in reinforcement learning by meta-learning an objective from offline data, resulting in significant performance improvements over state-of-the-art methods in four MuJoCo tasks.

A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. Prior works enhance online RL with conventional Imitation Learning (IL) via a handcrafted auxiliary objective, at the cost of restricting the RL policy to be sub-optimal when the offline data is generated by a non-expert policy. Instead, to better leverage valuable information in offline data, we develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data and instills intrinsic motivation towards the optimal policy. Distinct from prior works that are exclusive to a specific RL algorithm, GILD is a flexible module intended for diverse vanilla off-policy RL algorithms. In addition, GILD introduces no domain-specific hyperparameter and minimal increase in computational cost. In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.

View on arXiv PDF

Similar