Adversarial Online Learning with Temporal Feedback Graphs
This work addresses a fundamental challenge in online learning for scenarios with structured feedback delays, which is incremental but provides optimal solutions for important graph classes.
The paper tackles the problem of adversarial online learning with temporal feedback graphs, where the learner's actions depend only on losses from specific past rounds. It presents a novel algorithm that achieves optimal regret bounds for transitive feedback graphs, with tight lower bounds in many practical settings.
We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a directed "feedback graph" known to the learner). We present a novel learning algorithm for this setting based on a strategy of partitioning the losses across sub-cliques of this graph. We complement this with a lower bound that is tight in many practical settings, and which we conjecture to be within a constant factor of optimal. For the important class of transitive feedback graphs, we prove that this algorithm is efficiently implementable and obtains the optimal regret bound (up to a universal constant).