LGSYNov 30, 2017

Safe Exploration for Identifying Linear Systems via Robust Optimization

arXiv:1711.11165v15 citations
Originality Incremental advance
AI Analysis

This work is significant for practitioners deploying reinforcement learning in physical systems, such as data centers, where safety constraints are paramount and failures can be catastrophic.

This paper addresses the challenge of safely exploring unknown linear dynamical systems with Gaussian noise, aiming to identify system parameters with desired accuracy and confidence. The authors demonstrate a method to compute safe regions of the action space, starting from a nominal safe action, which allows for increased sample efficiency in identifying the system dynamics.

Safely exploring an unknown dynamical system is critical to the deployment of reinforcement learning (RL) in physical systems where failures may have catastrophic consequences. In scenarios where one knows little about the dynamics, diverse transition data covering relevant regions of state-action space is needed to apply either model-based or model-free RL. Motivated by the cooling of Google's data centers, we study how one can safely identify the parameters of a system model with a desired accuracy and confidence level. In particular, we focus on learning an unknown linear system with Gaussian noise assuming only that, initially, a nominal safe action is known. Define safety as satisfying specific linear constraints on the state space (e.g., requirements on process variable) that must hold over the span of an entire trajectory, and given a Probably Approximately Correct (PAC) style bound on the estimation error of model parameters, we show how to compute safe regions of action space by gradually growing a ball around the nominal safe action. One can apply any exploration strategy where actions are chosen from such safe regions. Experiments on a stylized model of data center cooling dynamics show how computing proper safe regions can increase the sample efficiency of safe exploration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes