LGNov 10, 2023

Sum-max Submodular Bandits

Stephen Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi

arXiv:2311.05975v15.32 citationsh-index: 48

Originality Incremental advance

AI Analysis

This addresses online optimization for submodular functions with bandit feedback, offering improved theoretical guarantees for problems like combinatorial bandits, but it is incremental as it builds on existing submodular maximization frameworks.

The paper tackles the problem of maximizing a sequence of submodular functions in online decision-making by introducing sum-max functions, a subclass with pseudo-concavity, and achieves a regret bound of order sqrt(MKT) with a (1-1/e) approximation, improving on prior O(T^{2/3}) bounds.

Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on facility location, $M$-medians, and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{O}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback.

View on arXiv PDF

Similar