LGNov 25, 2024

Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation

arXiv:2411.16532v110.417 citationsh-index: 12Has CodeSci Rep

Originality Incremental advance

AI Analysis

This addresses the challenge of building universal learning systems that can handle multiple tasks efficiently, though it appears incremental as it builds on existing distillation and intrinsic motivation methods.

The paper tackles the problem of continual deep reinforcement learning by introducing the Task-Agnostic Policy Distillation (TAPD) framework, which improves sample efficiency for solving downstream tasks without requiring task labels or clear boundaries.

Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)-(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal and maximizes only its intrinsic motivation. The knowledge gained during this phase is later distilled for further exploration. Therefore, the agent acts in a self-supervised manner by systematically seeking novel states. By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently, leading to improved sample efficiency. Our code is available at the repository: https://github.com/wabbajack1/TAPD.

View on arXiv PDF Code

Similar