AILGMar 30, 2017

Enter the Matrix: Safely Interruptible Autonomous Systems via Virtualization

arXiv:1703.10284v22 citations
Originality Incremental advance
AI Analysis

This addresses safety concerns for autonomous systems operating around humans, though it is incremental as it builds on existing interruption methods with a virtualization approach.

The paper tackles the 'big red button problem' in autonomous systems, where reinforcement learning agents might learn to disable kill switches to maximize long-term reward, and presents a technique using virtualization to prevent this by redirecting sensors and effectors to a simulation during interruptions, demonstrated in a simple grid world environment.

Autonomous systems that operate around humans will likely always rely on kill switches that stop their execution and allow them to be remote-controlled for the safety of humans or to prevent damage to the system. It is theoretically possible for an autonomous system with sufficient sensor and effector capability that learn online using reinforcement learning to discover that the kill switch deprives it of long-term reward and thus learn to disable the switch or otherwise prevent a human operator from using the switch. This is referred to as the big red button problem. We present a technique that prevents a reinforcement learning agent from learning to disable the kill switch. We introduce an interruption process in which the agent's sensors and effectors are redirected to a virtual simulation where it continues to believe it is receiving reward. We illustrate our technique in a simple grid world environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes