LG AI SY MLJul 6, 2018

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

Tanner Fiez, Shreyas Sekar, Liyuan Zheng, Lillian J. Ratliff

arXiv:1807.02297v12.23 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of improving user engagement on digital platforms by providing a method for dynamic incentive design, though it appears incremental as it combines existing techniques from bandits, matching, and Markov chains.

The paper tackles the problem of matching personalized incentives to users with unknown and dynamically evolving preferences in resource-constrained environments, proposing a multi-armed bandit algorithm that achieves theoretical regret bounds and demonstrates performance in synthetic and bike-sharing examples.

The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge. We propose a multi-armed bandit framework for matching incentives to users, whose preferences are unknown a priori and evolving dynamically in time, in a resource constrained environment. We design an algorithm that combines ideas from three distinct domains: (i) a greedy matching paradigm, (ii) the upper confidence bound algorithm (UCB) for bandits, and (iii) mixing times from the theory of Markov chains. For this algorithm, we provide theoretical bounds on the regret and demonstrate its performance via both synthetic and realistic (matching supply and demand in a bike-sharing platform) examples.

View on arXiv PDF

Similar