LGApr 8, 2021

Incentivizing Exploration in Linear Bandits under Information Gap

arXiv:2104.03860v15 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of balancing exploration and exploitation in bandit algorithms for myopic users, with incremental improvements in handling information gaps.

The paper tackles the problem of incentivizing exploration in linear bandits when users have more informative context features than the system, proposing a method that achieves sublinear regret and sublinear compensation.

We study the problem of incentivizing exploration for myopic users in linear bandits, where the users tend to exploit arm with the highest predicted reward instead of exploring. In order to maximize the long-term reward, the system offers compensation to incentivize the users to pull the exploratory arms, with the goal of balancing the trade-off among exploitation, exploration and compensation. We consider a new and practically motivated setting where the context features observed by the user are more informative than those used by the system, e.g., features based on users' private information are not accessible by the system. We propose a new method to incentivize exploration under such information gap, and prove that the method achieves both sublinear regret and sublinear compensation. We theoretical and empirically analyze the added compensation due to the information gap, compared with the case that the system has access to the same context features as the user, i.e., without information gap. We also provide a compensation lower bound of our problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes