LGMLOct 4, 2019

I'm sorry Dave, I'm afraid I can't do that, Deep Q-learning from forbidden action

arXiv:1910.02078v44 citations
Originality Incremental advance
AI Analysis

This addresses safety and efficiency issues for RL applications in domains like industrial robots or power grids, though it is an incremental improvement on existing methods.

The paper tackles the problem of reinforcement learning agents being unable to learn from forbidden actions in real-world environments with safety constraints, proposing a modified DQN algorithm that reduces constraint violations and accelerates convergence to near-optimal policies.

The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes. In this paper, we propose a simple modification of a state-of-the-art deep RL algorithm (DQN), enabling learning from forbidden actions. To do so, the standard Q-learning update is enhanced with an extra safety loss inspired by structured classification. We empirically show that it reduces the number of hit constraints during the learning phase and accelerates convergence to near-optimal policies compared to using standard DQN. Experiments are done on a Visual Grid World Environment and Text-World domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes