SCaLE: Switching Cost aware Learning and Exploration
This addresses a fundamental limitation in bandit optimization for applications requiring cost-aware decision-making, though it appears to be an incremental advance in algorithm design.
The paper tackles the problem of unbounded movement costs in bandit online convex optimization by developing the SCaLE algorithm, which achieves a distribution-agnostic sub-linear dynamic regret for stochastic environments with high-dimensional quadratic hitting costs and switching costs, as validated by numerical experiments showing statistical consistency.
This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $\ell_2$-norm switching costs in a noisy bandit feedback model. For a general class of stochastic environments, we provide the first algorithm SCaLE that provably achieves a distribution-agnostic sub-linear dynamic regret, without the knowledge of hitting cost structure. En-route, we present a novel spectral regret analysis that separately quantifies eigenvalue-error driven regret and eigenbasis-perturbation driven regret. Extensive numerical experiments, against online-learning baselines, corroborate our claims, and highlight statistical consistency of our algorithm.