LGGTMAMLJun 12, 2019

Competing Bandits in Matching Markets

arXiv:1906.05363v2108 citations
Originality Incremental advance
AI Analysis

This addresses the need for data-driven matching platforms in online markets, though it is incremental as it builds on classical stable matching and bandit frameworks.

The paper tackles the problem of learning preferences in two-sided matching markets where one side lacks prior knowledge, extending multi-armed bandits to multiple players with arm preferences. It shows surprising exploration-exploitation trade-offs in both centralized and decentralized approaches.

Stable matching, a classical model for two-sided markets, has long been studied with little consideration for how each side's preferences are learned. With the advent of massive online markets powered by data-driven matching platforms, it has become necessary to better understand the interplay between learning and market objectives. We propose a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards. Our model extends the standard multi-armed bandits framework to multiple players, with the added feature that arms have preferences over players. We study both centralized and decentralized approaches to this problem and show surprising exploration-exploitation trade-offs compared to the single player multi-armed bandits setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes