Combinatorial Semi-Bandits with Knapsacks
This work addresses a theoretical gap in multi-armed bandit research by combining two important models, which is incremental but useful for researchers in online learning and resource allocation.
The paper tackles the problem of unifying bandits with knapsacks and combinatorial semi-bandits into a common generalization, and it results in an algorithm with regret bounds comparable to those of the individual frameworks.
We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits. The former concerns limited "resources" consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a huge number of actions but assumes combinatorial structure and additional feedback to make the problem tractable. We define a common generalization, support it with several motivating examples, and design an algorithm for it. Our regret bounds are comparable with those for BwK and combinatorial semi- bandits.