LGAIDec 18, 2023

Prediction and Control in Continual Reinforcement Learning

arXiv:2312.11669v123 citationsh-index: 13NIPS
Originality Incremental advance
AI Analysis

This addresses the challenge of adapting to new situations in continual learning for RL agents, though it appears incremental as it builds on existing TD learning and CLS theory.

The paper tackles the problem of value function estimation in continual reinforcement learning by decomposing it into permanent and transient components that update at different timescales, resulting in significant performance improvements on prediction and control tasks.

Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes