Policy Optimization finds Nash Equilibrium in Regularized General-Sum LQ Games
This work addresses multi-agent reinforcement learning challenges in game theory by providing theoretical guarantees for equilibrium computation, though it appears incremental as it builds on existing policy optimization methods with regularization.
The paper tackles the problem of finding Nash Equilibria (NE) in general-sum linear-quadratic games by introducing relative entropy regularization, showing that NE correspond to linear Gaussian policies and proving linear convergence of a policy optimization algorithm to the NE under sufficient regularization, with a δ-augmentation technique achieving an ε-NE otherwise.
In this paper, we investigate the impact of introducing relative entropy regularization on the Nash Equilibria (NE) of General-Sum $N$-agent games, revealing the fact that the NE of such games conform to linear Gaussian policies. Moreover, it delineates sufficient conditions, contingent upon the adequacy of entropy regularization, for the uniqueness of the NE within the game. As Policy Optimization serves as a foundational approach for Reinforcement Learning (RL) techniques aimed at finding the NE, in this work we prove the linear convergence of a policy optimization algorithm which (subject to the adequacy of entropy regularization) is capable of provably attaining the NE. Furthermore, in scenarios where the entropy regularization proves insufficient, we present a $δ$-augmentation technique, which facilitates the achievement of an $ε$-NE within the game.