MLLGOct 27, 2015

Online Learning with Gaussian Payoffs and Side Observations

arXiv:1510.08108v146 citations
Originality Incremental advance
AI Analysis

This work addresses a refined information transfer model in online learning, offering theoretical guarantees for algorithms in scenarios with side observations, which is incremental but provides new bounds.

The paper tackles the problem of sequential learning with Gaussian payoffs and side observations, where the learner receives noisy information about all actions after choosing one, and provides non-asymptotic problem-dependent lower bounds on regret for the first time, along with algorithms that achieve these bounds up to constant or logarithmic factors.

We consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action $i$, the learner receives information about the payoff of every action $j$ in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair $(i,j)$ (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes