Feature-Based Q-Learning for Two-Player Stochastic Games
This work addresses sample efficiency in multi-agent reinforcement learning for stochastic games, offering a method with complexities independent of original game dimensions, though it is incremental as it builds on existing Q-learning and feature-based techniques.
The paper tackles the problem of approximating Nash equilibrium strategies in two-player zero-sum stochastic games with feature-based transitions, proposing a Q-learning algorithm that achieves ε-optimal strategies with sample size linear in the number of features, and an accelerated version reduces sample complexity to Õ(K/(ε²(1-γ)⁴)).
Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space. We propose a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling. The algorithm is shown to find an $ε$-optimal strategy using sample size linear to the number of features. To further improve its sample efficiency, we develop an accelerated algorithm by adopting techniques such as variance reduction, monotonicity preservation and two-sided strategy approximation. We prove that the algorithm is guaranteed to find an $ε$-optimal strategy using no more than $\tilde{\mathcal{O}}(K/(ε^{2}(1-γ)^{4}))$ samples with high probability, where $K$ is the number of features and $γ$ is a discount factor. The sample, time and space complexities of the algorithm are independent of original dimensions of the game.