LGJul 9, 2025

Direct Regret Optimization in Bayesian Optimization

arXiv:2507.06529v17.11 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses the challenge of myopic and hand-crafted methods in Bayesian optimization, offering improvements for applications like hyperparameter tuning, but it is incremental as it builds on existing BO frameworks.

The paper tackled the problem of optimizing expensive black-box functions in Bayesian optimization by proposing a direct regret optimization approach that jointly learns the optimal model and non-myopic acquisition, resulting in lower simple regret and more robust exploration in benchmarks.

Bayesian optimization (BO) is a powerful paradigm for optimizing expensive black-box functions. Traditional BO methods typically rely on separate hand-crafted acquisition functions and surrogate models for the underlying function, and often operate in a myopic manner. In this paper, we propose a novel direct regret optimization approach that jointly learns the optimal model and non-myopic acquisition by distilling from a set of candidate models and acquisitions, and explicitly targets minimizing the multi-step regret. Our framework leverages an ensemble of Gaussian Processes (GPs) with varying hyperparameters to generate simulated BO trajectories, each guided by an acquisition function chosen from a pool of conventional choices, until a Bayesian early stop criterion is met. These simulated trajectories, capturing multi-step exploration strategies, are used to train an end-to-end decision transformer that directly learns to select next query points aimed at improving the ultimate objective. We further adopt a dense training--sparse learning paradigm: The decision transformer is trained offline with abundant simulated data sampled from ensemble GPs and acquisitions, while a limited number of real evaluations refine the GPs online. Experimental results on synthetic and real-world benchmarks suggest that our method consistently outperforms BO baselines, achieving lower simple regret and demonstrating more robust exploration in high-dimensional or noisy settings.

View on arXiv PDF

Similar