LGMLOct 24, 2018

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

arXiv:1810.10637v26 citations
Originality Highly original
AI Analysis

This work addresses routing efficiency in networked systems, offering incremental improvements through novel algorithmic connections between bandit learning and shortest path algorithms.

The paper tackles the problem of stochastic online shortest path routing with end-to-end feedback by introducing efficient algorithms that achieve nearly optimal regrets, leveraging the networked structure of the action set to overcome large-scale challenges. The results show superior regret performances and drastically reduced runtime in numerical experiments.

We introduce efficient algorithms which achieve nearly optimal regrets for the problem of stochastic online shortest path routing with end-to-end feedback. The setting is a natural application of the combinatorial stochastic bandits problem, a special case of the linear stochastic bandits problem. We show how the difficulties posed by the large scale action set can be overcome by the networked structure of the action set. Our approach presents a novel connection between bandit learning and shortest path algorithms. Our main contribution is an adaptive exploration algorithm with nearly optimal instance-dependent regret for any directed acyclic network. We then modify it so that nearly optimal worst case regret is achieved simultaneously. Driven by the carefully designed Top-Two Comparison (TTC) technique, the algorithms are efficiently implementable. We further conduct extensive numerical experiments to show that our proposed algorithms not only achieve superior regret performances, but also reduce the runtime drastically.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes