LGMLDec 11, 2020

Smooth Bandit Optimization: Generalization to Hölder Space

arXiv:2012.06076v116 citations
AI Analysis

This work provides optimal regret rates for bandit optimization with higher-order smooth functions, which is important for researchers working on theoretical aspects of bandit algorithms.

This paper addresses the problem of bandit optimization for reward functions in Hölder space with exponent α > 1, bridging the gap between Lipschitz and infinitely-differentiable models. The authors propose a two-layer algorithm that achieves an optimal regret upper bound of Õ(T^((d+α)/(d+2α))) for α > 1, matching existing lower bounds. They also demonstrate adaptation to unknown function smoothness, matching existing lower bounds for α ≤ 1.

We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for $α$-Hölder continuous (including Lipschitz) functions with $0<α\leq 1$. Our main result is in generalization of the reward function to Hölder space with exponent $α>1$ to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For Hölder continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T^\frac{d+α}{d+2α})$ for when $α>1$, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of Hölder spaces indexed by $α$, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the $α\leq 1$ subset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes