LG DSOct 9, 2022

Learning on the Edge: Online Learning with Stochastic Feedback Graphs

Emmanuel Esposito, Federico Fusco, Dirk van der Hoeven, Nicolò Cesa-Bianchi

arXiv:2210.04229v111.814 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses a generalization of sequential decision-making for machine learning and optimization, offering incremental improvements in regret bounds for stochastic feedback scenarios.

The paper tackles the problem of online learning with stochastic feedback graphs, where edges are realized probabilistically each round, and proves nearly optimal regret bounds of order min{√((α_ε/ε)T), (δ_ε/ε)^{1/3} T^{2/3}} without prior knowledge of the graph, while also developing a more efficient algorithm when the entire graph realization is observed.

The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order $\min\bigl\{\min_{\varepsilon} \sqrt{(α_\varepsilon/\varepsilon) T},\, \min_{\varepsilon} (δ_\varepsilon/\varepsilon)^{1/3} T^{2/3}\bigr\}$ (ignoring logarithmic factors), where $α_{\varepsilon}$ and $δ_{\varepsilon}$ are graph-theoretic quantities measured on the support of the stochastic feedback graph $\mathcal{G}$ with edge probabilities thresholded at $\varepsilon$. Our result, which holds without any preliminary knowledge about $\mathcal{G}$, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.

View on arXiv PDF

Similar