Multiple Mean-Payoff Optimization under Local Stability Constraints
This addresses the issue of unstable performance in discrete systems for researchers and practitioners in control theory and game theory, though it is incremental as it builds on prior work in mean-payoff optimization.
The paper tackles the problem of constructing controllers that simultaneously optimize multiple mean payoffs under local stability constraints, which was previously computationally hard and lacked practical algorithms; it presents the first efficient and scalable solution for Markov decision processes.
The long-run average payoff per transition (mean payoff) is the main tool for specifying the performance and dependability properties of discrete systems. The problem of constructing a controller (strategy) simultaneously optimizing several mean payoffs has been deeply studied for stochastic and game-theoretic models. One common issue of the constructed controllers is the instability of the mean payoffs, measured by the deviations of the average rewards per transition computed in a finite "window" sliding along a run. Unfortunately, the problem of simultaneously optimizing the mean payoffs under local stability constraints is computationally hard, and the existing works do not provide a practically usable algorithm even for non-stochastic models such as two-player games. In this paper, we design and evaluate the first efficient and scalable solution to this problem applicable to Markov decision processes.