MLLGJan 5, 2021

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

arXiv:2101.01301v120 citations
Originality Highly original
AI Analysis

This work addresses a fundamental open problem in bandit literature by clarifying the regret bounds for non-linear reward functions, revealing a significant gap compared to linear reward structures for researchers in bandit algorithms.

This paper investigates adversarial combinatorial bandits with known non-linear reward functions, extending prior work on linear cases. It establishes that the minimax optimal regret is \u03f4(sqrt(N^d T)) for d-degree polynomial reward functions (d < K) and \u03f4(sqrt(N^K T)) for non-low-degree polynomial functions, where N is the number of arms, K is the subset size, and T is the time horizon.

In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback.} We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $\widetildeΘ_{d}(\sqrt{N^d T})$ if the reward function is a $d$-degree polynomial with $d< K$, and $Θ_K(\sqrt{N^K T})$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O(\sqrt{\mathrm{poly}(N,K)T})$ for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures.} Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual $\binom{N}{K}$ assortment as independent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes