LGMLOct 23, 2018

Online learning with feedback graphs and switching costs

arXiv:1810.09666v218 citations
AI Analysis

This work addresses the problem of designing efficient online learning algorithms for scenarios with partial feedback and switching costs, which is incremental as it builds on existing graph-based feedback models.

The paper tackles online learning with partial feedback graphs and switching costs, establishing a lower bound on expected regret for general graphs and showing that optimal algorithms without switching costs become suboptimal when switching costs are introduced. It proposes two new algorithms, Threshold Based EXP3 and EXP3.SC, which achieve order-optimal regret in certain settings and outperform prior methods in empirical evaluations.

We study online learning when partial feedback information is provided following every action of the learning process, and the learner incurs switching costs for changing his actions. In this setting, the feedback information system can be represented by a graph, and previous works studied the expected regret of the learner in the case of a clique (Expert setup), or disconnected single loops (Multi-Armed Bandits (MAB)). This work provides a lower bound on the expected regret in the Partial Information (PI) setting, namely for general feedback graphs --excluding the clique. Additionally, it shows that all algorithms that are optimal without switching costs are necessarily sub-optimal in the presence of switching costs, which motivates the need to design new algorithms. We propose two new algorithms: Threshold Based EXP3 and EXP3. SC. For the two special cases of symmetric PI setting and MAB, the expected regret of both of these algorithms is order optimal in the duration of the learning process. Additionally, Threshold Based EXP3 is order optimal in the switching cost, whereas EXP3. SC is not. Finally, empirical evaluations show that Threshold Based EXP3 outperforms the previously proposed order-optimal algorithms EXP3 SET in the presence of switching costs, and Batch EXP3 in the MAB setting with switching costs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes