OCAILGSYAug 25, 2020

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

arXiv:2008.11592v343 citations
AI Analysis

This addresses the robustness problem for reinforcement learning in control systems, providing incremental theoretical guarantees for a specific benchmark.

The paper tackles the robustness of policy iteration in linear quadratic regulation (LQR) to errors in learning, showing that it is inherently robust with bounded solutions converging to a small neighborhood of the optimal solution under small errors. It proposes a novel off-policy optimistic least-squares policy iteration for LQR with stochastic disturbances, validated numerically.

This paper studies the robustness of reinforcement learning algorithms to errors in the learning process. Specifically, we revisit the benchmark problem of discrete-time linear quadratic regulation (LQR) and study the long-standing open question: Under what conditions is the policy iteration method robustly stable from a dynamical systems perspective? Using advanced stability results in control theory, it is shown that policy iteration for LQR is inherently robust to small errors in the learning process and enjoys small-disturbance input-to-state stability: whenever the error in each iteration is bounded and small, the solutions of the policy iteration algorithm are also bounded, and, moreover, enter and stay in a small neighbourhood of the optimal LQR solution. As an application, a novel off-policy optimistic least-squares policy iteration for the LQR problem is proposed, when the system dynamics are subjected to additive stochastic disturbances. The proposed new results in robust reinforcement learning are validated by a numerical example.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes