LGMLJul 17, 2020

Bandits for BMO Functions

arXiv:2007.08703v15 citations
Originality Incremental advance
AI Analysis

This addresses a theoretical challenge in bandit optimization for signals with infinities, but it appears incremental as it extends existing bandit frameworks to a specific function class.

The paper tackles the bandit problem with expected rewards modeled as Bounded Mean Oscillation (BMO) functions, which can be discontinuous and unbounded, and develops an algorithm achieving poly-log δ-regret by competing against an arm optimal after removing a δ-sized portion of the arm space.

We study the bandit problem where the underlying expected reward is a Bounded Mean Oscillation (BMO) function. BMO functions are allowed to be discontinuous and unbounded, and are useful in modeling signals with infinities in the do-main. We develop a toolset for BMO bandits, and provide an algorithm that can achieve poly-log $δ$-regret -- a regret measured against an arm that is optimal after removing a $δ$-sized portion of the arm space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes