LGGTOct 25, 2023

Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

arXiv:2310.16252v211 citationsh-index: 8
Originality Highly original
AI Analysis

This work addresses a fundamental problem in game theory and reinforcement learning for researchers and practitioners, providing a generalization and optimal solution for stochastic and dueling bandits.

The paper tackles the problem of identifying the pure strategy Nash equilibrium in noisy two-player zero-sum matrix games with minimal samples, achieving a near-optimal algorithm that matches the instance-dependent lower bound up to logarithmic factors.

We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+η$ where $η$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to identify the PSNE of $A$, whenever it exists, with high probability while taking as few samples as possible. Zhou et al. (2017) presents an instance-dependent sample complexity lower bound that depends only on the entries in the row and column in which the PSNE lies. We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors. The problem of identifying the PSNE also generalizes the problem of pure exploration in stochastic multi-armed bandits and dueling bandits, and our result matches the optimal bounds, up to log factors, in both the settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes