LGApr 19, 2024

Data-Incremental Continual Offline Reinforcement Learning

arXiv:2404.12639v31 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses a specific challenge in offline RL for continual learning scenarios, but it is incremental as it builds on existing methods like implicit Q-learning.

The paper tackles the problem of active forgetting in a new continual learning setting called data-incremental continual offline reinforcement learning (DICORL), where an agent learns from a sequence of datasets for a single task, and proposes an algorithm (EREIQL) that relieves this issue and performs well.

In this work, we propose a new setting of continual learning: data-incremental continual offline reinforcement learning (DICORL), in which an agent is asked to learn a sequence of datasets of a single offline reinforcement learning (RL) task continually, instead of learning a sequence of offline RL tasks with respective datasets. Then, we propose that this new setting will introduce a unique challenge to continual learning: active forgetting, which means that the agent will forget the learnt skill actively. The main reason for active forgetting is conservative learning used by offline RL, which is used to solve the overestimation problem. With conservative learning, the offline RL method will suppress the value of all actions, learnt or not, without selection, unless it is in the just learning dataset. Therefore, inferior data may overlay premium data because of the learning sequence. To solve this problem, we propose a new algorithm, called experience-replay-based ensemble implicit Q-learning (EREIQL), which introduces multiple value networks to reduce the initial value and avoid using conservative learning, and the experience replay to relieve catastrophic forgetting. Our experiments show that EREIQL relieves active forgetting in DICORL and performs well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes