IR GT LG MLJun 4, 2024

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

arXiv:2406.04374v14.01 citations

Originality Incremental advance

AI Analysis

This work addresses incentive-aware recommendation for online preference learning, offering a principled approach to handle self-interested user behaviors in two-sided markets, though it is incremental as it builds on existing methods for exploration-exploitation and incentive compatibility.

The paper tackles the problem of designing recommender systems that balance exploration-exploitation and maintain dynamic incentive compatibility in two-sided markets, proposing a two-stage algorithm (RCB) that achieves O(√(KdT)) regret and satisfies Bayesian incentive compatibility under a Gaussian prior assumption, with empirical validation in simulations and a real-world warfarin dosing application.

Recommender systems play a crucial role in internet economies by connecting users with relevant products or services. However, designing effective recommender systems faces two key challenges: (1) the exploration-exploitation tradeoff in balancing new product exploration against exploiting known preferences, and (2) dynamic incentive compatibility in accounting for users' self-interested behaviors and heterogeneous preferences. This paper formalizes these challenges into a Dynamic Bayesian Incentive-Compatible Recommendation Protocol (DBICRP). To address the DBICRP, we propose a two-stage algorithm (RCB) that integrates incentivized exploration with an efficient offline learning component for exploitation. In the first stage, our algorithm explores available products while maintaining dynamic incentive compatibility to determine sufficient sample sizes. The second stage employs inverse proportional gap sampling integrated with an arbitrary machine learning method to ensure sublinear regret. Theoretically, we prove that RCB achieves $O(\sqrt{KdT})$ regret and satisfies Bayesian incentive compatibility (BIC) under a Gaussian prior assumption. Empirically, we validate RCB's strong incentive gain, sublinear regret, and robustness through simulations and a real-world application on personalized warfarin dosing. Our work provides a principled approach for incentive-aware recommendation in online preference learning settings.

View on arXiv PDF

Similar