LG AIMay 4, 2023

High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling

arXiv:2305.02614v38.85 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of sample-efficiency in Bayesian optimization for researchers and practitioners dealing with high-dimensional optimization tasks, representing an incremental advancement by combining existing paradigms in a novel way.

The paper tackles the problem of expensive labeled data queries in high-dimensional Bayesian optimization by introducing a semi-supervised learning approach called Teacher-Student Bayesian Optimization (TSBO), which integrates a teacher-student paradigm with optimized unlabeled data sampling to enhance generalization and sample-efficiency, demonstrating significantly improved performance under tight labeled data budgets.

We introduce a novel semi-supervised learning approach, named Teacher-Student Bayesian Optimization ($\texttt{TSBO}$), integrating the teacher-student paradigm into BO to minimize expensive labeled data queries for the first time. $\texttt{TSBO}$ incorporates a teacher model, an unlabeled data sampler, and a student model. The student is trained on unlabeled data locations generated by the sampler, with pseudo labels predicted by the teacher. The interplay between these three components implements a unique selective regularization to the teacher in the form of student feedback. This scheme enables the teacher to predict high-quality pseudo labels, enhancing the generalization of the GP surrogate model in the search space. To fully exploit $\texttt{TSBO}$, we propose two optimized unlabeled data samplers to construct effective student feedback that well aligns with the objective of Bayesian optimization. Furthermore, we quantify and leverage the uncertainty of the teacher-student model for the provision of reliable feedback to the teacher in the presence of risky pseudo-label predictions. $\texttt{TSBO}$ demonstrates significantly improved sample-efficiency in several global optimization tasks under tight labeled data budgets.

View on arXiv PDF

Similar