LGGTNov 30, 2024

Bandit Learning in Matching Markets: Utilitarian and Rawlsian Perspectives

arXiv:2412.00301v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses fairness and efficiency in matching markets like job placements and ride-sharing, but it is incremental as it builds on existing bandit learning frameworks.

The paper tackles the problem of learning preferences in two-sided matching markets where preferences are unknown, proposing algorithms to optimize both utilitarian and Rawlsian welfare while maintaining market stability, with simulated experiments evaluating these metrics.

Two-sided matching markets have demonstrated significant impact in many real-world applications, including school choice, medical residency placement, electric vehicle charging, ride sharing, and recommender systems. However, traditional models often assume that preferences are known, which is not always the case in modern markets, where preferences are unknown and must be learned. For example, a company may not know its preference over all job applicants a priori in online markets. Recent research has modeled matching markets as multi-armed bandit (MAB) problem and primarily focused on optimizing matching for one side of the market, while often resulting in a pessimal solution for the other side. In this paper, we adopt a welfarist approach for both sides of the market, focusing on two metrics: (1) Utilitarian welfare and (2) Rawlsian welfare, while maintaining market stability. For these metrics, we propose algorithms based on epoch Explore-Then-Commit (ETC) and analyze their regret bounds. Finally, we conduct simulated experiments to evaluate both welfare and market stability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes