LG MLSep 7, 2018

Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms

arXiv:1809.02707v27.112 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of optimizing decision-making in complex bandit settings for researchers in machine learning, but it is incremental as it extends existing methods to a specific variant with probabilistic triggering.

The paper tackles the regret analysis of combinatorial Thompson sampling (CTS) for a combinatorial multi-armed bandit with probabilistically triggered arms under semi-bandit feedback, deriving an O(∑_{i=1}^m log T / (p_i Δ_i)) regret bound under Lipschitz continuity assumptions and comparing CTS with combinatorial upper confidence bound via numerical experiments on a cascading bandit problem.

We analyze the regret of combinatorial Thompson sampling (CTS) for the combinatorial multi-armed bandit with probabilistically triggered arms under the semi-bandit feedback setting. We assume that the learner has access to an exact optimization oracle but does not know the expected base arm outcomes beforehand. When the expected reward function is Lipschitz continuous in the expected base arm outcomes, we derive $O(\sum_{i =1}^m \log T / (p_i Δ_i))$ regret bound for CTS, where $m$ denotes the number of base arms, $p_i$ denotes the minimum non-zero triggering probability of base arm $i$ and $Δ_i$ denotes the minimum suboptimality gap of base arm $i$. We also compare CTS with combinatorial upper confidence bound (CUCB) via numerical experiments on a cascading bandit problem.

View on arXiv PDF

Similar