Resourceful Contextual Bandits
This addresses resource-constrained decision-making problems in real-world applications like advertising and pricing, representing a novel integration of two established frameworks.
The paper tackles contextual bandits with resource constraints, common in applications like ad selection and dynamic pricing, by designing the first algorithm that handles non-time resource constraints and improves over trivial reductions to non-contextual cases, proving a regret guarantee with near-optimal statistical properties.
We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that handles constrained resources other than time, and improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, e.g. Dudik et al. (UAI'11)) and bandits with resource constraints (bandits with knapsacks, Badanidiyuru et al. (FOCS'13)), and prove a regret guarantee with near-optimal statistical properties.