AINov 24, 2023

Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery

Ekaterina Nikonova, Cheng Xue, Jochen Renz

arXiv:2311.14270v13.91 citationsh-index: 28

Originality Incremental advance

AI Analysis

This addresses the challenge of making AI agents more resilient and efficient in dynamic real-world environments, representing an incremental improvement in reinforcement learning methods.

The paper tackles the problems of catastrophic forgetting and sample inefficiency in deep reinforcement learning for open-world adaptation by proposing a framework that enables agents to autonomously discover task-specific rules, resulting in significantly increased learning efficiency and faster novelty adaptation compared to baseline agents.

Deep reinforcement learning suffers from catastrophic forgetting and sample inefficiency making it less applicable to the ever-changing real world. However, the ability to use previously learned knowledge is essential for AI agents to quickly adapt to novelties. Often, certain spatial information observed by the agent in the previous interactions can be leveraged to infer task-specific rules. Inferred rules can then help the agent to avoid potentially dangerous situations in the previously unseen states and guide the learning process increasing agent's novelty adaptation speed. In this work, we propose a general framework that is applicable to deep reinforcement learning agents. Our framework provides the agent with an autonomous way to discover the task-specific rules in the novel environments and self-supervise it's learning. We provide a rule-driven deep Q-learning agent (RDQ) as one possible implementation of that framework. We show that RDQ successfully extracts task-specific rules as it interacts with the world and uses them to drastically increase its learning efficiency. In our experiments, we show that the RDQ agent is significantly more resilient to the novelties than the baseline agents, and is able to detect and adapt to novel situations faster.

View on arXiv PDF

Similar