LGJun 12, 2021

Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations

arXiv:2106.06712v15.59 citationsh-index: 24

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient and robust decision-making in combinatorial bandits for researchers and practitioners, offering simpler and faster algorithms with competitive performance, though it is incremental in improving existing methods.

The paper tackles the stochastic combinatorial semi-bandit problem with adversarial corruptions by providing a simple combinatorial algorithm that achieves a regret of $ ilde{O}(C + d^2K/\Delta_{min})$, improving on prior combinatorial bounds and nearly matching more complex convex program-based methods, while also handling approximation oracles with a regret of $ ilde{O}(d\sqrt{KT})$.

We consider the stochastic combinatorial semi-bandit problem with adversarial corruptions. We provide a simple combinatorial algorithm that can achieve a regret of $\tilde{O}\left(C+d^2K/Δ_{min}\right)$ where $C$ is the total amount of corruptions, $d$ is the maximal number of arms one can play in each round, $K$ is the number of arms. If one selects only one arm in each round, we achieves a regret of $\tilde{O}\left(C+\sum_{Δ_i>0}(1/Δ_i)\right)$. Our algorithm is combinatorial and improves on the previous combinatorial algorithm by [Gupta et al., COLT2019] (their bound is $\tilde{O}\left(KC+\sum_{Δ_i>0}(1/Δ_i)\right)$), and almost matches the best known bounds obtained by [Zimmert et al., ICML2019] and [Zimmert and Seldin, AISTATS2019] (up to logarithmic factor). Note that the algorithms in [Zimmert et al., ICML2019] and [Zimmert and Seldin, AISTATS2019] require one to solve complex convex programs while our algorithm is combinatorial, very easy to implement, requires weaker assumptions and has very low oracle complexity and running time. We also study the setting where we only get access to an approximation oracle for the stochastic combinatorial semi-bandit problem. Our algorithm achieves an (approximation) regret bound of $\tilde{O}\left(d\sqrt{KT}\right)$. Our algorithm is very simple, only worse than the best known regret bound by $\sqrt{d}$, and has much lower oracle complexity than previous work.

View on arXiv PDF

Similar