ML LGNov 25, 2024

Alpha Entropy Search for New Information-based Bayesian Optimization

Daniel Fernández-Sánchez, Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato

arXiv:2411.16586v15.52 citationsh-index: 2Has CodeKnowledge-Based Systems

Originality Incremental advance

AI Analysis

This work addresses the need for more flexible and efficient acquisition functions in Bayesian optimization, particularly for practitioners in machine learning and optimization, though it is incremental as it builds on existing information-based methods.

The paper tackles the problem of improving Bayesian optimization by introducing Alpha Entropy Search (AES), a new class of acquisition functions based on α-divergence, which generalizes KL divergence and allows tuning via a parameter α to trade off local and global distribution differences; experiments on synthetic, benchmark, and real-world tasks, such as hyperparameter tuning for deep neural networks, show that AES performs competitively with other information-based methods like JES, MES, or PES.

Bayesian optimization (BO) methods based on information theory have obtained state-of-the-art results in several tasks. These techniques heavily rely on the Kullback-Leibler (KL) divergence to compute the acquisition function. In this work, we introduce a novel information-based class of acquisition functions for BO called Alpha Entropy Search (AES). AES is based on the α-divergence, that generalizes the KL divergence. Iteratively, AES selects the next evaluation point as the one whose associated target value has the highest level of the dependency with respect to the location and associated value of the global maximum of the optimization problem. Dependency is measured in terms of the α-divergence, as an alternative to the KL divergence. Intuitively, this favors the evaluation of the objective function at the most informative points about the global maximum. The α-divergence has a free parameter α, which determines the behavior of the divergence, trading-off evaluating differences between distributions at a single mode, and evaluating differences globally. Therefore, different values of α result in different acquisition functions. AES acquisition lacks a closed-form expression. However, we propose an efficient and accurate approximation using a truncated Gaussian distribution. In practice, the value of α can be chosen by the practitioner, but here we suggest to use a combination of acquisition functions obtained by simultaneously considering a range of values of α. We provide an implementation of AES in BOTorch and we evaluate its performance in both synthetic, benchmark and real-world experiments involving the tuning of the hyper-parameters of a deep neural network. These experiments show that the performance of AES is competitive with respect to other information-based acquisition functions such as JES, MES or PES.

View on arXiv PDF Code

Similar