Haokai Hong

h-index9

7papers

85citations

Novelty54%

AI Score34

Ranked #127,701 of 201,326 authors (top 63%)#407 in NE (top 36%)

7 Papers

NEApr 8, 2023

Improving Performance Insensitivity of Large-scale Multiobjective Optimization via Monte Carlo Tree Search

Haokai Hong, Min Jiang, Gary G. Yen

The large-scale multiobjective optimization problem (LSMOP) is characterized by simultaneously optimizing multiple conflicting objectives and involving hundreds of decision variables. Many real-world applications in engineering fields can be modeled as LSMOPs; simultaneously, engineering applications require insensitivity in performance. This requirement usually means that the results from the algorithm runs should not only be good for every run in terms of performance but also that the performance of multiple runs should not fluctuate too much, i.e., the algorithm shows good insensitivity. Considering that substantial computational resources are requested for each run, it is essential to improve upon the performance of the large-scale multiobjective optimization algorithm, as well as the insensitivity of the algorithm. However, existing large-scale multiobjective optimization algorithms solely focus on improving the performance of the algorithms, leaving the insensitivity characteristics unattended. In this work, we propose an evolutionary algorithm for solving LSMOPs based on Monte Carlo tree search, the so-called LMMOCTS, which aims to improve the performance and insensitivity for large-scale multiobjective optimization problems. The proposed method samples the decision variables to construct new nodes on the Monte Carlo tree for optimization and evaluation. It selects nodes with good evaluation for further search to reduce the performance sensitivity caused by large-scale decision variables. We compare the proposed algorithm with several state-of-the-art designs on different benchmark functions. We also propose two metrics to measure the sensitivity of the algorithm. The experimental results confirm the effectiveness and performance insensitivity of the proposed design for solving large-scale multiobjective optimization problems.

NEApr 8, 2023

Efficiently Tackling Million-Dimensional Multiobjective Problems: A Direction Sampling and Fine-Tuning Approach

Haokai Hong, Min Jiang, Qiuzhen Lin et al.

We define very large-scale multiobjective optimization problems as optimizing multiple objectives (VLSMOPs) with more than 100,000 decision variables. These problems hold substantial significance, given the ubiquity of real-world scenarios necessitating the optimization of hundreds of thousands, if not millions, of variables. However, the larger dimension in VLSMOPs intensifies the curse of dimensionality and poses significant challenges for existing large-scale evolutionary multiobjective algorithms, rendering them more difficult to solve within the constraints of practical computing resources. To overcome this issue, we propose a novel approach called the very large-scale multiobjective optimization framework (VMOF). The method efficiently samples general yet suitable evolutionary directions in the very large-scale space and subsequently fine-tunes these directions to locate the Pareto-optimal solutions. To sample the most suitable evolutionary directions for different solutions, Thompson sampling is adopted for its effectiveness in recommending from a very large number of items within limited historical evaluations. Furthermore, a technique is designed for fine-tuning directions specific to tracking Pareto-optimal solutions. To understand the designed framework, we present our analysis of the framework and then evaluate VMOF using widely recognized benchmarks and real-world problems spanning dimensions from 100 to 1,000,000. Experimental results demonstrate that our method exhibits superior performance not only on LSMOPs but also on VLSMOPs when compared to existing algorithms.

LGMay 24, 2024Code

Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport

Haokai Hong, Wanyu Lin, Kay Chen Tan

This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data distribution. Our formula is solved within a joint, equivariant, and smooth representation space. This is achieved by transforming the multi-modal features into a continuous latent space with equivariant networks. In addition, we find that identifying optimal distributional coupling is necessary for fast and effective transport between any two distributions. We further propose a mechanism for estimating and purifying optimal coupling to train the flow model with optimal transport. By doing so, GOAT can turn arbitrary distribution couplings into new deterministic couplings, leading to an estimated optimal transport plan for fast 3D molecule generation. The purification filters out the subpar molecules to ensure the ultimate generation quality. We theoretically and empirically prove that the proposed optimal coupling estimation and purification yield transport plan with non-increasing cost. Finally, extensive experiments show that GOAT enjoys the efficiency of solving geometric optimal transport, leading to a double speedup compared to the sub-optimal method while achieving the best generation quality regarding validity, uniqueness, and novelty. The code is available at https://github.com/WanyuGroup/ICLR2025-GOAT.

LGApr 1, 2024

Diffusion-Driven Domain Adaptation for Generating 3D Molecules

Haokai Hong, Wanyu Lin, Kay Chen Tan

Can we train a molecule generator that can generate 3D molecules from a new domain, circumventing the need to collect data? This problem can be cast as the problem of domain adaptive molecule generation. This work presents a novel and principled diffusion-based approach, called GADM, that allows shifting a generative model to desired new domains without the need to collect even a single molecule. As the domain shift is typically caused by the structure variations of molecules, e.g., scaffold variations, we leverage a designated equivariant masked autoencoder (MAE) along with various masking strategies to capture the structural-grained representations of the in-domain varieties. In particular, with an asymmetric encoder-decoder module, the MAE can generalize to unseen structure variations from the target domains. These structure variations are encoded with an equivariant encoder and treated as domain supervisors to control denoising. We show that, with these encoded structural-grained domain supervisors, GADM can generate effective molecules within the desired new domains. We conduct extensive experiments across various domain adaptation tasks over benchmarking datasets. We show that our approach can improve up to 65.6% in terms of success rate defined based on molecular validity, uniqueness, and novelty compared to alternative baselines.

CRFeb 25, 2025

PII-Bench: Evaluating Query-Aware Privacy Protection Systems

Hao Shen, Zhouhong Gu, Haokai Hong et al.

The widespread adoption of Large Language Models (LLMs) has raised significant privacy concerns regarding the exposure of personally identifiable information (PII) in user prompts. To address this challenge, we propose a query-unrelated PII masking strategy and introduce PII-Bench, the first comprehensive evaluation framework for assessing privacy protection systems. PII-Bench comprises 2,842 test samples across 55 fine-grained PII categories, featuring diverse scenarios from single-subject descriptions to complex multi-party interactions. Each sample is carefully crafted with a user query, context description, and standard answer indicating query-relevant PII. Our empirical evaluation reveals that while current models perform adequately in basic PII detection, they show significant limitations in determining PII query relevance. Even state-of-the-art LLMs struggle with this task, particularly in handling complex multi-subject scenarios, indicating substantial room for improvement in achieving intelligent PII masking.

NEJul 16, 2021

Solving Large-Scale Multi-Objective Optimization via Probabilistic Prediction Model

Haokai Hong, Kai Ye, Min Jiang et al.

The main feature of large-scale multi-objective optimization problems (LSMOP) is to optimize multiple conflicting objectives while considering thousands of decision variables at the same time. An efficient LSMOP algorithm should have the ability to escape the local optimal solution from the huge search space and find the global optimal. Most of the current researches focus on how to deal with decision variables. However, due to the large number of decision variables, it is easy to lead to high computational cost. Maintaining the diversity of the population is one of the effective ways to improve search efficiency. In this paper, we propose a probabilistic prediction model based on trend prediction model and generating-filtering strategy, called LT-PPM, to tackle the LSMOP. The proposed method enhances the diversity of the population through importance sampling. At the same time, due to the adoption of an individual-based evolution mechanism, the computational cost of the proposed method is independent of the number of decision variables, thus avoiding the problem of exponential growth of the search space. We compared the proposed algorithm with several state-of-the-art algorithms for different benchmark functions. The experimental results and complexity analysis have demonstrated that the proposed algorithm has significant improvement in terms of its performance and computational efficiency in large-scale multi-objective optimization.

NEJan 8, 2021

Manifold Interpolation for Large-Scale Multi-Objective Optimization via Generative Adversarial Networks

Zhenzhong Wang, Haokai Hong, Kai Ye et al.

Large-scale multiobjective optimization problems (LSMOPs) are characterized as involving hundreds or even thousands of decision variables and multiple conflicting objectives. An excellent algorithm for solving LSMOPs should find Pareto-optimal solutions with diversity and escape from local optima in the large-scale search space. Previous research has shown that these optimal solutions are uniformly distributed on the manifold structure in the low-dimensional space. However, traditional evolutionary algorithms for solving LSMOPs have some deficiencies in dealing with this structural manifold, resulting in poor diversity, local optima, and inefficient searches. In this work, a generative adversarial network (GAN)-based manifold interpolation framework is proposed to learn the manifold and generate high-quality solutions on this manifold, thereby improving the performance of evolutionary algorithms. We compare the proposed algorithm with several state-of-the-art algorithms on large-scale multiobjective benchmark functions. Experimental results have demonstrated the significant improvements achieved by this framework in solving LSMOPs.