AIJan 26, 2018

Safe Exploration in Continuous Action Spaces

arXiv:1801.08757v1508 citations
Originality Incremental advance
AI Analysis

This work addresses safety in continuous action spaces for real-world applications, offering a novel solution to avoid constraint violations where existing methods fail, though it is incremental in adapting known safety techniques to new scenarios.

The paper tackles the problem of deploying reinforcement learning agents on physical systems like datacenters or robots without violating critical constraints, achieving zero constraint violations during learning by adding a safety layer that analytically corrects actions using a linearized model learned from past trajectories.

We address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated. We show how to exploit the typically smooth dynamics of these systems and enable RL algorithms to never violate constraints during learning. Our technique is to directly add to the policy a safety layer that analytically solves an action correction formulation per each state. The novelty of obtaining an elegant closed-form solution is attained due to a linearized model, learned on past trajectories consisting of arbitrary actions. This is to mimic the real-world circumstances where data logs were generated with a behavior policy that is implausible to describe mathematically; such cases render the known safety-aware off-policy methods inapplicable. We demonstrate the efficacy of our approach on new representative physics-based environments, and prevail where reward shaping fails by maintaining zero constraint violations.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes