Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback
This work addresses optimization problems in machine learning and AI, particularly for online decision-making with submodular functions, offering incremental improvements in efficiency and regret bounds.
The paper tackles online continuous submodular maximization by proposing algorithms that reduce gradient evaluations and achieve regret bounds, with Mono-Frank-Wolfe achieving O(T^{4/5}) regret and Bandit-Frank-Wolfe achieving O(T^{8/9}) regret in bandit settings.
In this paper, we propose three online algorithms for submodular maximisation. The first one, Mono-Frank-Wolfe, reduces the number of per-function gradient evaluations from $T^{1/2}$ [Chen2018Online] and $T^{3/2}$ [chen2018projection] to 1, and achieves a $(1-1/e)$-regret bound of $O(T^{4/5})$. The second one, Bandit-Frank-Wolfe, is the first bandit algorithm for continuous DR-submodular maximization, which achieves a $(1-1/e)$-regret bound of $O(T^{8/9})$. Finally, we extend Bandit-Frank-Wolfe to a bandit algorithm for discrete submodular maximization, Responsive-Frank-Wolfe, which attains a $(1-1/e)$-regret bound of $O(T^{8/9})$ in the responsive bandit setting.