LG CO OC MLOct 11, 2024

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Julien Zhou, Pierre Gaillard, Thibaud Rahier, Julyan Arbel

arXiv:2410.08578v24.61 citationsh-index: 18ALT

Originality Incremental advance

AI Analysis

This addresses the challenge of optimizing non-monotone submodular functions under limited feedback for decision-makers in online settings, representing an incremental improvement over prior bandit algorithms.

The paper tackles the online unconstrained submodular maximization problem with stochastic bandit feedback by proposing DG-ETC, achieving a problem-dependent regret bound of O(d log(dT)) and a problem-free bound of O(dT^{2/3} log(dT)^{1/3}), which outperform existing methods.

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.

View on arXiv PDF

Similar