LG MLApr 17, 2021

Conservative Contextual Combinatorial Cascading Bandit

Kun Wang, Canzhe Zhao, Shuai Li, Shuo Shao

arXiv:2104.08615v25.56 citations

Originality Incremental advance

AI Analysis

This work addresses decision-making challenges in domains like recommendation systems by providing a conservative mechanism to avoid poor performance, though it is incremental as it builds on existing bandit frameworks.

The paper tackles the problem of balancing exploration and exploitation in online decision-making by introducing a conservative contextual combinatorial cascading bandit model, which ensures recommendations are not worse than a base strategy, and proposes an algorithm with proven regret bounds that decompose into general bandit and conservative terms.

Conservative mechanism is a desirable property in decision-making problems which balance the tradeoff between the exploration and exploitation. We propose the novel \emph{conservative contextual combinatorial cascading bandit ($C^4$-bandit)}, a cascading online learning game which incorporates the conservative mechanism. At each time step, the learning agent is given some contexts and has to recommend a list of items but not worse than the base strategy and then observes the reward by some stopping rules. We design the $C^4$-UCB algorithm to solve the problem and prove its n-step upper regret bound for two situations: known baseline reward and unknown baseline reward. The regret in both situations can be decomposed into two terms: (a) the upper bound for the general contextual combinatorial cascading bandit; and (b) a constant term for the regret from the conservative mechanism. We also improve the bound of the conservative contextual combinatorial bandit as a by-product. Experiments on synthetic data demonstrate its advantages and validate our theoretical analysis.

View on arXiv PDF

Similar