LGJun 4, 2025

Learning Equilibria in Matching Games with Bandit Feedback

Andreas Athanasopoulos, Christos Dimitrakakis

arXiv:2506.03802v14.1h-index: 3

Originality Incremental advance

AI Analysis

This addresses the challenge of decentralized decision-making in matching markets for agents and platforms, but it is incremental as it builds on existing matching and bandit learning frameworks.

The paper tackles the problem of learning equilibria in two-sided matching markets where agents engage in zero-sum games with unknown payoffs, using bandit feedback, and proposes a UCB algorithm that achieves sublinear, instance-independent regret over time horizon T.

We investigate the problem of learning an equilibrium in a generalized two-sided matching market, where agents can adaptively choose their actions based on their assigned matches. Specifically, we consider a setting in which matched agents engage in a zero-sum game with initially unknown payoff matrices, and we explore whether a centralized procedure can learn an equilibrium from bandit feedback. We adopt the solution concept of matching equilibrium, where a pair consisting of a matching $\mathfrak{m}$ and a set of agent strategies $X$ forms an equilibrium if no agent has the incentive to deviate from $(\mathfrak{m}, X)$. To measure the deviation of a given pair $(\mathfrak{m}, X)$ from the equilibrium pair $(\mathfrak{m}^\star, X^\star)$, we introduce matching instability that can serve as a regret measure for the corresponding learning problem. We then propose a UCB algorithm in which agents form preferences and select actions based on optimistic estimates of the game payoffs, and prove that it achieves sublinear, instance-independent regret over a time horizon $T$.

View on arXiv PDF

Similar