Budgeted Combinatorial Multi-Armed Bandits
This work addresses a more general setting in combinatorial bandits for researchers in online learning and optimization, but it is incremental as it builds on existing techniques like Bandits with Knapsacks.
The paper tackles the problem of minimizing total expected regret in a budgeted combinatorial multi-armed bandit setting where the algorithm can dynamically decide how many arms to pull per round based on remaining budget and rounds, proposing two algorithms (CBwK-Greedy-UCB and CBwK-LPUCB) with proven regret bounds and experimental results showing CBwK-Greedy-UCB performs slightly better, with regret approaching zero for very high budgets.
We consider a budgeted combinatorial multi-armed bandit setting where, in every round, the algorithm selects a super-arm consisting of one or more arms. The goal is to minimize the total expected regret after all rounds within a limited budget. Existing techniques in this literature either fix the budget per round or fix the number of arms pulled in each round. Our setting is more general where based on the remaining budget and remaining number of rounds, the algorithm can decide how many arms to be pulled in each round. First, we propose CBwK-Greedy-UCB algorithm, which uses a greedy technique, CBwK-Greedy, to allocate the arms to the rounds. Next, we propose a reduction of this problem to Bandits with Knapsacks (BwK) with a single pull. With this reduction, we propose CBwK-LPUCB that uses PrimalDualBwK ingeniously. We rigorously prove regret bounds for CBwK-LP-UCB. We experimentally compare the two algorithms and observe that CBwK-Greedy-UCB performs incrementally better than CBwK-LP-UCB. We also show that for very high budgets, the regret goes to zero.