An Online Learning Approach for Two-Player Zero-Sum Linear Quadratic Games
This work addresses control and game theory problems for applications like robotics or economics, but it is incremental as it builds on existing methods for linear quadratic games.
The paper tackles the problem of two-player zero-sum linear quadratic games with unknown dynamics by developing an online learning approach that combines model estimation and policy updates, resulting in a regret analysis and numerical verification of convergence performance.
In this paper, we present an online learning approach for two-player zero-sum linear quadratic games with unknown dynamics. We develop a framework combining regularized least squares model estimation, high probability confidence sets, and surrogate model selection to maintain a regular model for policy updates. We apply a shrinkage step at each episode to identify a surrogate model in the region where the generalized algebraic Riccati equation admits a stabilizing saddle point solution. We then establish regret analysis on algorithm convergence, followed by a numerical example to illustrate the convergence performance and verify the regret analysis.