LGDec 5, 2013

Bandit Online Optimization Over the Permutahedron

arXiv:1312.1530v221 citations
Originality Incremental advance
AI Analysis

This work addresses computational efficiency in online learning for combinatorial optimization, offering a more practical solution for large-scale applications, though it is incremental as it builds on prior methods.

The paper tackles the problem of bandit online optimization over the permutahedron, where an algorithm CombBand had impractical runtime due to matrix permanent approximations. The authors developed a new algorithm achieving regret O(n^{3/2}√T) with total time complexity O(n^3T), improving efficiency for large time horizons.

The permutahedron is the convex polytope with vertex set consisting of the vectors $(π(1),\dots, π(n))$ for all permutations (bijections) $π$ over $\{1,\dots, n\}$. We study a bandit game in which, at each step $t$, an adversary chooses a hidden weight weight vector $s_t$, a player chooses a vertex $π_t$ of the permutahedron and suffers an observed loss of $\sum_{i=1}^n π(i) s_t(i)$. A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of $O(n\sqrt{T \log n})$ for a time horizon of $T$. Unfortunately, CombBand requires at each step an $n$-by-$n$ matrix permanent approximation to within improved accuracy as $T$ grows, resulting in a total running time that is super linear in $T$, making it impractical for large time horizons. We provide an algorithm of regret $O(n^{3/2}\sqrt{T})$ with total time complexity $O(n^3T)$. The ideas are a combination of CombBand and a recent algorithm by Ailon (2013) for online optimization over the permutahedron in the full information setting. The technical core is a bound on the variance of the Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices generated from rational functions of exponentials of 3 parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes