LG AI MLMay 30, 2014

Learning to Act Greedily: Polymatroid Semi-Bandits

Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko

arXiv:1405.7752v310 citations

Originality Incremental advance

AI Analysis

This work addresses learning variants of optimization problems like minimum spanning tree for scenarios where the model is unknown, though it appears incremental as it builds on existing bandit and polymatroid frameworks.

The paper tackles the problem of learning to maximize an unknown modular function on a known polymatroid in a bandit setting, proposing a computationally efficient algorithm with tight regret bounds, and demonstrates its practicality on three problems.

Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method. In this work, we study a learning variant of these problems, where the model of the problem is unknown and has to be learned by interacting repeatedly with the environment in the bandit setting. We formalize our learning problem quite generally, as learning how to maximize an unknown modular function on a known polymatroid. We propose a computationally efficient algorithm for solving our problem and bound its expected cumulative regret. Our gap-dependent upper bound is tight up to a constant and our gap-free upper bound is tight up to polylogarithmic factors. Finally, we evaluate our method on three problems and demonstrate that it is practical.

View on arXiv PDF

Similar