LG AI GT STMar 7, 2022

Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

Yifei Min, Tianhao Wang, Ruitu Xu, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang

arXiv:2203.03684v116.130 citationsh-index: 187

Originality Incremental advance

AI Analysis

This addresses matching market optimization for applications like ridesharing platforms, but it is incremental as it builds on existing RL and matching theory.

The paper tackles the problem of maximizing cumulative social welfare in Markov matching markets with strategic agents, where contexts determine utilities and the planner controls context transitions. The result is a reinforcement learning algorithm that achieves sublinear regret.

We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market. At each step, the agents are presented with a dynamical context, where the contexts determine the utilities. The planner controls the transition of the contexts to maximize the cumulative social welfare, while the agents aim to find a myopic stable matching at each step. Such a setting captures a range of applications including ridesharing platforms. We formalize the problem by proposing a reinforcement learning framework that integrates optimistic value iteration with maximum weight matching. The proposed algorithm addresses the coupled challenges of sequential exploration, matching stability, and function approximation. We prove that the algorithm achieves sublinear regret.

View on arXiv PDF

Similar