Geometry Meets Incentives: Sample-Efficient Incentivized Exploration with Linear Contexts
This solves a key bottleneck in incentivized learning for self-interested agents, enabling efficient exploration without exogenous data, though it is incremental as it builds on prior work on geometric conditions.
The paper tackles the problem of incentivized exploration in linear bandits, where initial data acquisition can be exponentially hard in high dimensions, and shows that under mild geometric conditions on actions, an incentive-compatible algorithm achieves polynomial sample complexity in dimension.
In the incentivized exploration model, a principal aims to explore and learn over time by interacting with a sequence of self-interested agents. It has been recently understood that the main challenge in designing incentive-compatible algorithms for this problem is to gather a moderate amount of initial data, after which one can obtain near-optimal regret via posterior sampling. With high-dimensional contexts, however, this \emph{initial exploration} phase requires exponential sample complexity in some cases, which prevents efficient learning unless initial data can be acquired exogenously. We show that these barriers to exploration disappear under mild geometric conditions on the set of available actions, in which case incentive-compatibility does not preclude regret-optimality. Namely, we consider the linear bandit model with actions in the Euclidean unit ball, and give an incentive-compatible exploration algorithm with sample complexity that scales polynomially with the dimension and other parameters.