Incentivising Exploration and Recommendations for Contextual Bandits with Payments
This work addresses the challenge of improving user engagement on platforms like e-commerce and recommendation engines by using payments to overcome myopic behavior, though it is incremental by extending adversarial contexts.
The paper tackles the problem of incentivizing myopic users to explore different items in contextual bandit settings to learn item attributes and maximize social welfare, achieving sublinear regret and providing theoretical bounds on incentivization costs.
We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in this domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.