LG MLJul 29, 2022

Best-of-Both-Worlds Algorithms for Partial Monitoring

arXiv:2207.14550v314.618 citationsh-index: 23

Originality Incremental advance

AI Analysis

This provides improved algorithms for sequential decision-making under partial feedback, addressing a core challenge in online learning, though it is incremental as it builds on existing frameworks like follow-the-regularized-leader.

The study tackles the partial monitoring problem by developing best-of-both-worlds algorithms that achieve favorable regret bounds in both stochastic and adversarial regimes, with specific bounds such as O(m^2 k^4 log(T) log(k_Π T) / Δ_min) for non-degenerate locally observable games in the stochastic regime.

This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m^2 k^4 \log(T) \log(k_Π T) / Δ_{\min})$ in the stochastic regime and $O(m k^{2/3} \sqrt{T \log(T) \log k_Π})$ in the adversarial regime, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $Δ_{\min}$ is the minimum suboptimality gap, and $k_Π$ is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is $O(c_{\mathcal{G}}^2 \log(T) \log(k_Π T) / Δ_{\min}^2)$ in the stochastic regime and $O((c_{\mathcal{G}}^2 \log(T) \log(k_Π T))^{1/3} T^{2/3})$ in the adversarial regime, where $c_{\mathcal{G}}$ is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.

View on arXiv PDF

Similar