ML LGFeb 5, 2019

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

arXiv:1902.01520v422.873 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient decision-making in continuous action spaces for online learning systems, representing an incremental advance with adaptive algorithms.

The paper tackles the problem of contextual bandit learning with continuous actions by deriving two regret bounds that adapt to problem difficulty and smoothness, achieving improved guarantees for benign problems without tuning.

We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent "zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.

View on arXiv PDF

Similar