SYSYDec 21, 2017

From Dissipativity Theory to Compositional Construction of Finite Markov Decision Processes

arXiv:1712.0779340 citationsh-index: 35
Originality Incremental advance
AI Analysis

For control engineers and researchers, this work addresses the scalability challenge in constructing finite MDPs for large-scale stochastic systems, offering a compositional framework that avoids restrictive assumptions.

The paper proposes a compositional method for constructing finite Markov decision processes (MDPs) of interconnected stochastic control systems using dissipativity theory, enabling scalable policy synthesis. The approach is demonstrated on a 200-room temperature regulation network, where the compositional condition imposes no constraints on subsystem number or gains.

This paper is concerned with a compositional approach for constructing finite Markov decision processes of interconnected discrete-time stochastic control systems. The proposed approach leverages the interconnection topology and a notion of so-called stochastic storage functions describing joint dissipativity-type properties of subsystems and their abstractions. In the first part of the paper, we derive dissipativity-type compositional conditions for quantifying the error between the interconnection of stochastic control subsystems and that of their abstractions. In the second part of the paper, we propose an approach to construct finite Markov decision processes together with their corresponding stochastic storage functions for classes of discrete-time control systems satisfying some incremental passivablity property. Under this property, one can construct finite Markov decision processes by a suitable discretization of the input and state sets. Moreover, we show that for linear stochastic control systems, the aforementioned property can be readily checked by some matrix inequality. We apply our proposed results to the temperature regulation in a circular building by constructing compositionally a finite Markov decision process of a network containing 200 rooms in which the compositionality condition does not require any constraint on the number or gains of the subsystems. We employ the constructed finite Markov decision process as a substitute to synthesize policies regulating the temperature in each room for a bounded time horizon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes