LGMay 28, 2022

Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges

Massimo Caccia, Jonas Mueller, Taesup Kim, Laurent Charlin, Rasool Fakoor

arXiv:2205.14495v37.811 citationsh-index: 27Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of catastrophic forgetting in continual learning for AI agents, offering incremental improvements for resource-constrained and high-dimensional environments.

The paper tackled the problem of performance differences between task-agnostic continual learning and multi-task agents in reinforcement learning, finding that a replay-based recurrent method outperforms baselines and can surpass multi-task equivalents in high-dimensional settings.

Continual learning (CL) enables the development of models and agents that learn from a sequence of tasks while addressing the limitations of standard deep learning approaches, such as catastrophic forgetting. In this work, we investigate the factors that contribute to the performance differences between task-agnostic CL and multi-task (MTL) agents. We pose two hypotheses: (1) task-agnostic methods might provide advantages in settings with limited data, computation, or high dimensionality, and (2) faster adaptation may be particularly beneficial in continual learning settings, helping to mitigate the effects of catastrophic forgetting. To investigate these hypotheses, we introduce a replay-based recurrent reinforcement learning (3RL) methodology for task-agnostic CL agents. We assess 3RL on a synthetic task and the Meta-World benchmark, which includes 50 unique manipulation tasks. Our results demonstrate that 3RL outperforms baseline methods and can even surpass its multi-task equivalent in challenging settings with high dimensionality. We also show that the recurrent task-agnostic agent consistently outperforms or matches the performance of its transformer-based counterpart. These findings provide insights into the advantages of task-agnostic CL over task-aware MTL approaches and highlight the potential of task-agnostic methods in resource-constrained, high-dimensional, and multi-task environments.

View on arXiv PDF Code

Similar