LGCOOCMLOct 11, 2024

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

arXiv:2410.08578v21 citationsh-index: 18ALT
Originality Incremental advance
AI Analysis

This addresses the challenge of optimizing non-monotone submodular functions under limited feedback for decision-makers in online settings, representing an incremental improvement over prior bandit algorithms.

The paper tackles the online unconstrained submodular maximization problem with stochastic bandit feedback by proposing DG-ETC, achieving a problem-dependent regret bound of O(d log(dT)) and a problem-free bound of O(dT^{2/3} log(dT)^{1/3}), which outperform existing methods.

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes