LGMay 12

Efficient Conditioning Why Pseudo Observation Batch Bayesian Optimization Works When It Does not

arXiv:2605.188197.6

Predicted impact top 94% in LG · last 90 daysOriginality Highly original

AI Analysis

Provides a unified theoretical framework for understanding when and why batch Bayesian optimization methods work, addressing a long-standing gap for practitioners and researchers.

The paper identifies efficient conditioning as the key property for successful batch Bayesian optimization, proving that Gaussian Processes produce provably distinct batch points, and unifies existing batch selection methods under a single conditioning mechanism. Experiments on synthetic and real-world benchmarks validate theoretical predictions, showing that GP-based conditioning outperforms or matches explicit penalties and parametric surrogates produce degenerate batches.

Constant Liar (CL), Kriging Believer (KB), and fantasy models are widely used for batch selection in parallel Bayesian Optimization, yet a unified theory explaining their effectiveness and conditions under which they fail has been lacking. We identify efficient conditioning as the key surrogate property the ability to update predictions in closed form when data is augmented. We prove that Gaussian Processes satisfy this requirement, producing provably distinct batch points with separation of order l, and that this holds for any acquisition function monotonically non decreasing in posterior uncertainty (EI, UCB, PI), with qualitatively similar behavior for Thompson Sampling. We unify CL, KB, and fantasy models as instances of a single conditioning mechanism differing only in the lie value distribution, and draw quantitative connections to Local Penalization (LP) and qualitative connections to Determinantal Point Processes (DPPs). To disentangle model structure from optimizer randomness, we introduce the Structural Diversity Diagnostic (SDD), a reusable methodology for testing surrogate compatibility. Experiments on Hartmann6D, Ackley 8D, Levy10D, and SVM hyperparameter tuning validate all theoretical predictions: CL or KBs implicit penalty matches or outperforms explicit LP greedy conditioning achieves convergence on par with joint qEI efficient conditioning extends to Multiquadric RBF networks; and parametric surrogates produce degenerate batches even when fully retrained (random forests), while neural networks regain diversity only at 15x the wall clock cost of GP conditioning. Robustness is confirmed across multiple initial datasets and under observation noise.

View on arXiv PDF

Similar