The Hung Tran

LG
h-index10
3papers
6citations
Novelty53%
AI Score43

3 Papers

LGApr 14
Black-Box Optimization From Small Offline Datasets via Meta Learning with Synthetic Tasks

Azza Fadhel, The Hung Tran, Trong Nghia Hoang et al.

We consider the problem of offline black-box optimization, where the goal is to discover optimal designs (e.g., molecules or materials) from past experimental data. A key challenge in this setting is data scarcity: in many scientific applications, only small or poor-quality datasets are available, which severely limits the effectiveness of existing algorithms. Prior work has theoretically and empirically shown that performance of offline optimization algorithms depends on how well the surrogate model captures the optimization bias (i.e., ability to rank input designs correctly), which is challenging to accomplish with limited experimental data. This paper proposes Surrogate Learning with Optimization Bias via Synthetic Task Generation (OptBias), a meta-learning framework that directly tackles data scarcity. OptBias learns a reusable optimization bias by training on synthetic tasks generated from a Gaussian process, and then fine-tunes the surrogate model on the small data for the target task. Across diverse continuous and discrete offline optimization benchmarks, OptBias consistently outperforms state-of-the-art baselines in small data regimes. These results highlight OptBias as a robust and practical solution for offline optimization in realistic small data settings.

LGSep 19, 2025Code
ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge

Manh Cuong Dao, The Hung Tran, Phi Le Nguyen et al.

This paper studies the black-box optimization task which aims to find the maxima of a black-box function using a static set of its observed input-output pairs. This is often achieved via learning and optimizing a surrogate function with that offline data. Alternatively, it can also be framed as an inverse modeling task that maps a desired performance to potential input candidates that achieve it. Both approaches are constrained by the limited amount of offline data. To mitigate this limitation, we introduce a new perspective that casts offline optimization as a distributional translation task. This is formulated as learning a probabilistic bridge transforming an implicit distribution of low-value inputs (i.e., offline data) into another distribution of high-value inputs (i.e., solution candidates). Such probabilistic bridge can be learned using low- and high-value inputs sampled from synthetic functions that resemble the target function. These synthetic functions are constructed as the mean posterior of multiple Gaussian processes fitted with different parameterizations on the offline data, alleviating the data bottleneck. The proposed approach is evaluated on an extensive benchmark comprising most recent methods, demonstrating significant improvement and establishing a new state-of-the-art performance. Our code is publicly available at https://github.com/cuong-dm/ROOT.

LGDec 21, 2024
High-Dimensional Bayesian Optimization via Random Projection of Manifold Subspaces

Quoc-Anh Hoang Nguyen, The Hung Tran

Bayesian Optimization (BO) is a popular approach to optimizing expensive-to-evaluate black-box functions. Despite the success of BO, its performance may decrease exponentially as the dimensionality increases. A common framework to tackle this problem is to assume that the objective function depends on a limited set of features that lie on a low-dimensional manifold embedded in the high-dimensional ambient space. The latent space can be linear or more generally nonlinear. To learn feature mapping, existing works usually use an encode-decoder framework which is either computationally expensive or susceptible to overfittting when the labeled data is limited. This paper proposes a new approach for BO in high dimensions by exploiting a new representation of the objective function. Our approach combines a random linear projection to reduce the dimensionality, with a representation learning of the nonlinear manifold. When the geometry of the latent manifold is available, a solution to exploit this geometry is proposed for representation learning. In contrast, we use a neural network. To mitigate overfitting by using the neural network, we train the feature mapping in a geometry-aware semi-supervised manner. Our approach enables efficient optimizing of BO's acquisition function in the low-dimensional space, with the advantage of projecting back to the original high-dimensional space compared to existing works in the same setting. Finally, we show empirically that our algorithm outperforms other high-dimensional BO baselines in various synthetic functions and real applications.