LG GTOct 25, 2023

Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

Arnab Maiti, Ross Boczar, Kevin Jamieson, Lillian J. Ratliff

arXiv:2310.16252v29.811 citationsh-index: 8Has Code

Originality Highly original

AI Analysis

This work addresses a fundamental problem in game theory and reinforcement learning for researchers and practitioners, providing a generalization and optimal solution for stochastic and dueling bandits.

The paper tackles the problem of identifying the pure strategy Nash equilibrium in noisy two-player zero-sum matrix games with minimal samples, achieving a near-optimal algorithm that matches the instance-dependent lower bound up to logarithmic factors.

We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+η$ where $η$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to identify the PSNE of $A$, whenever it exists, with high probability while taking as few samples as possible. Zhou et al. (2017) presents an instance-dependent sample complexity lower bound that depends only on the entries in the row and column in which the PSNE lies. We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors. The problem of identifying the PSNE also generalizes the problem of pure exploration in stochastic multi-armed bandits and dueling bandits, and our result matches the optimal bounds, up to log factors, in both the settings.

View on arXiv PDF Code

Similar