MLLGFeb 22, 2023

When Combinatorial Thompson Sampling meets Approximation Regret

arXiv:2302.11182v19 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses a specific drawback in bandit algorithms for researchers, providing incremental improvements in regret bounds under certain oracle conditions.

The paper tackles the poor approximation regret of Combinatorial Thompson Sampling (CTS) in combinatorial multi-armed bandits by introducing a condition (REDUCE2EXACT) that enables an O(log(T)/Δ) regret upper bound, extending analysis beyond the greedy oracle to more problems like online influence maximization.

We study the Combinatorial Thompson Sampling policy (CTS) for combinatorial multi-armed bandit problems (CMAB), within an approximation regret setting. Although CTS has attracted a lot of interest, it has a drawback that other usual CMAB policies do not have when considering non-exact oracles: for some oracles, CTS has a poor approximation regret (scaling linearly with the time horizon $T$) [Wang and Chen, 2018]. A study is then necessary to discriminate the oracles on which CTS could learn. This study was started by Kong et al. [2021]: they gave the first approximation regret analysis of CTS for the greedy oracle, obtaining an upper bound of order $\mathcal{O}(\log(T)/Δ^2)$, where $Δ$ is some minimal reward gap. In this paper, our objective is to push this study further than the simple case of the greedy oracle. We provide the first $\mathcal{O}(\log(T)/Δ)$ approximation regret upper bound for CTS, obtained under a specific condition on the approximation oracle, allowing a reduction to the exact oracle analysis. We thus term this condition REDUCE2EXACT, and observe that it is satisfied in many concrete examples. Moreover, it can be extended to the probabilistically triggered arms setting, thus capturing even more problems, such as online influence maximization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes