LGMLNov 18, 2024

Competing Bandits in Decentralized Contextual Matching Markets

arXiv:2411.11794v2h-index: 11
Originality Incremental advance
AI Analysis

This addresses resource allocation in large-scale matching markets like job platforms, but it is incremental as it builds on existing bandit frameworks.

The paper tackles the problem of decentralized learning in two-sided matching markets with time-varying preferences and non-stationary latent environments, proposing algorithms that achieve instance-dependent logarithmic regret independent of the number of arms.

Sequential learning in a multi-agent resource constrained matching market has received significant interest in the past few years. We study decentralized learning in two-sided matching markets where the demand side (aka players or agents) competes for the supply side (aka arms) with potentially time-varying preferences to obtain a stable match. Motivated by the linear contextual bandit framework, we assume that for each agent, an arm-mean may be represented by a linear function of a known feature vector and an unknown (agent-specific) parameter. Moreover, the preferences over arms depend on a latent environment in each round, where the latent environment varies across rounds in a non-stationary manner. We propose learning algorithms to identify the latent environment and obtain stable matchings simultaneously. Our proposed algorithms achieve instance-dependent logarithmic regret, scaling independently of the number of arms, and hence applicable for a large market.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes