LGSYOCFeb 2, 2021

Stability-Constrained Markov Decision Processes Using MPC

arXiv:2102.01383v117 citations
Originality Incremental advance
AI Analysis

This work provides a method for control engineers to ensure stability in learned policies for discounted MDPs, which is a critical safety requirement in real-world applications.

This paper addresses the problem of solving discounted Markov Decision Processes (MDPs) while ensuring the resulting policy is stabilizing. By reformulating stable discounted MDPs as undiscounted ones, the authors leverage Model Predictive Control (MPC) to construct policies that are stabilizing by design, yielding the optimal policy if stable or the best stabilizing policy otherwise.

In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured policy in the context of Reinforcement Learning to make it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. The stability theory for MPC is most mature for the undiscounted MPC case. Hence, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the MPC-based policy with stability requirements will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes