Towards Practical Lipschitz Bandits
This work addresses the challenge of efficient exploration-exploitation in bandit algorithms for practical applications such as hyperparameter tuning, representing an incremental improvement with a novel model.
The paper tackles the problem of optimizing rewards and minimizing regret in Lipschitz bandit problems by introducing a framework that adaptively learns partitions of context- and arm-space, linking tree-based methods to Gaussian processes and designing a novel hierarchical Bayesian model. The result is state-of-the-art performance in real-world tasks like neural network hyperparameter tuning.
Stochastic Lipschitz bandit algorithms balance exploration and exploitation, and have been used for a variety of important task domains. In this paper, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and arm-space. Due to this flexibility, the algorithm is able to efficiently optimize rewards and minimize regret, by focusing on the portions of the space that are most relevant. In our analysis, we link tree-based methods to Gaussian processes. In light of our analysis, we design a novel hierarchical Bayesian model for Lipschitz bandit problems. Our experiments show that our algorithms can achieve state-of-the-art performance in challenging real-world tasks such as neural network hyperparameter tuning.