GTAug 7, 2023
Unsynchronized Decentralized Q-Learning: Two Timescale Analysis By PersistenceBora Yongacoglu, Gürdal Arslan, Serdar Yüksel
Non-stationarity is a fundamental challenge in multi-agent reinforcement learning (MARL), where agents update their behaviour as they learn. Many theoretical advances in MARL avoid the challenge of non-stationarity by coordinating the policy updates of agents in various ways, including synchronizing times at which agents are allowed to revise their policies. Synchronization enables analysis of many MARL algorithms via multi-timescale methods, but such synchronization is infeasible in many decentralized applications. In this paper, we study an unsynchronized variant of the decentralized Q-learning algorithm, a recent MARL algorithm for stochastic games. We provide sufficient conditions under which the unsynchronized algorithm drives play to equilibrium with high probability. Our solution utilizes constant learning rates in the Q-factor update, which we show to be critical for relaxing the synchronization assumptions of earlier work. Our analysis also applies to unsynchronized generalizations of a number of other algorithms from the regret testing tradition, whose performance is analyzed by multi-timescale methods that study Markov chains obtained via policy update dynamics. This work extends the applicability of the decentralized Q-learning algorithm and its relatives to settings in which parameters are selected in an independent manner, and tames non-stationarity without imposing the coordination assumptions of prior work.
GTMar 26, 2024
Paths to Equilibrium in GamesBora Yongacoglu, Gürdal Arslan, Lacra Pavel et al.
In multi-agent reinforcement learning (MARL) and game theory, agents repeatedly interact and revise their strategies as new data arrives, producing a sequence of strategy profiles. This paper studies sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning, where an agent who is best responding in one period does not switch its strategy in the next period. This constraint merely requires that optimizing agents do not switch strategies, but does not constrain the non-optimizing agents in any way, and thus allows for exploration. Sequences with this property are called satisficing paths, and arise naturally in many MARL algorithms. A fundamental question about strategic dynamics is such: for a given game and initial strategy profile, is it always possible to construct a satisficing path that terminates at an equilibrium? The resolution of this question has implications about the capabilities or limitations of a class of MARL algorithms. We answer this question in the affirmative for normal-form games. Our analysis reveals a counterintuitive insight that reward deteriorating strategic updates are key to driving play to equilibrium along a satisficing path.
GTOct 9, 2021
Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic GamesBora Yongacoglu, Gürdal Arslan, Serdar Yüksel
In multi-agent reinforcement learning (MARL), independent learners are those that do not observe the actions of other agents in the system. Due to the decentralization of information, it is challenging to design independent learners that drive play to equilibrium. This paper investigates the feasibility of using satisficing dynamics to guide independent learners to approximate equilibrium in stochastic games. For $ε\geq 0$, an $ε$-satisficing policy update rule is any rule that instructs the agent to not change its policy when it is $ε$-best-responding to the policies of the remaining players; $ε$-satisficing paths are defined to be sequences of joint policies obtained when each agent uses some $ε$-satisficing policy update rule to select its next policy. We establish structural results on the existence of $ε$-satisficing paths into $ε$-equilibrium in both symmetric $N$-player games and general stochastic games with two players. We then present an independent learning algorithm for $N$-player symmetric games and give high probability guarantees of convergence to $ε$-equilibrium under self-play. This guarantee is made using symmetry alone, leveraging the previously unexploited structure of $ε$-satisficing paths.
OCJun 25, 2015
Decentralized Q-Learning for Stochastic Teams and GamesGürdal Arslan, Serdar Yüksel
There are only a few learning algorithms applicable to stochastic dynamic teams and games which generalize Markov decision processes to decentralized stochastic control problems involving possibly self-interested decision makers. Learning in games is generally difficult because of the non-stationary environment in which each decision maker aims to learn its optimal decisions with minimal information in the presence of the other decision makers who are also learning. In stochastic dynamic games, learning is more challenging because, while learning, the decision makers alter the state of the system and hence the future cost. In this paper, we present decentralized Q-learning algorithms for stochastic games, and study their convergence for the weakly acyclic case which includes team problems as an important special case. The algorithm is decentralized in that each decision maker has access to only its local information, the state information, and the local cost realizations; furthermore, it is completely oblivious to the presence of other decision makers. We show that these algorithms converge to equilibrium policies almost surely in large classes of stochastic games.