Weijie Zheng

NE
h-index12
18papers
321citations
Novelty54%
AI Score51

18 Papers

NENov 23, 2022
Runtime Analysis for the NSGA-II: Proving, Quantifying, and Explaining the Inefficiency For Many Objectives

Weijie Zheng, Benjamin Doerr

The NSGA-II is one of the most prominent algorithms to solve multi-objective optimization problems. Despite numerous successful applications, several studies have shown that the NSGA-II is less effective for larger numbers of objectives. In this work, we use mathematical runtime analyses to rigorously demonstrate and quantify this phenomenon. We show that even on the simple $m$-objective generalization of the discrete OneMinMax benchmark, where every solution is Pareto optimal, the NSGA-II also with large population sizes cannot compute the full Pareto front (objective vectors of all Pareto optima) in sub-exponential time when the number of objectives is at least three. The reason for this unexpected behavior lies in the fact that in the computation of the crowding distance, the different objectives are regarded independently. This is not a problem for two objectives, where any sorting of a pair-wise incomparable set of solutions according to one objective is also such a sorting according to the other objective (in the inverse order).

NEMar 5, 2022
Approximation Guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II)

Weijie Zheng, Benjamin Doerr

Recent theoretical works have shown that the NSGA-II efficiently computes the full Pareto front when the population size is large enough. In this work, we study how well it approximates the Pareto front when the population size is smaller. For the OneMinMax benchmark, we point out situations in which the parents and offspring cover well the Pareto front, but the next population has large gaps on the Pareto front. Our mathematical proofs suggest as reason for this undesirable behavior that the NSGA-II in the selection stage computes the crowding distance once and then removes individuals with smallest crowding distance without considering that a removal increases the crowding distance of some individuals. We then analyze two variants not prone to this problem. For the NSGA-II that updates the crowding distance after each removal (Kukkonen and Deb (2006)) and the steady-state NSGA-II (Nebro and Durillo (2009)), we prove that the gaps in the Pareto front are never more than a small constant factor larger than the theoretical minimum. This is the first mathematical work on the approximation ability of the NSGA-II and the first runtime analysis for the steady-state NSGA-II. Experiments also show the superior approximation ability of the two NSGA-II variants.

NEJun 18, 2022
From Understanding Genetic Drift to a Smart-Restart Mechanism for Estimation-of-Distribution Algorithms

Weijie Zheng, Benjamin Doerr

Estimation-of-distribution algorithms (EDAs) are optimization algorithms that learn a distribution on the search space from which good solutions can be sampled easily. A key parameter of most EDAs is the sample size (population size). If the population size is too small, the update of the probabilistic model builds on few samples, leading to the undesired effect of genetic drift. Too large population sizes avoid genetic drift, but slow down the process. Building on a recent quantitative analysis of how the population size leads to genetic drift, we design a smart-restart mechanism for EDAs. By stopping runs when the risk for genetic drift is high, it automatically runs the EDA in good parameter regimes. Via a mathematical runtime analysis, we prove a general performance guarantee for this smart-restart scheme. This in particular shows that in many situations where the optimal (problem-specific) parameter values are known, the restart scheme automatically finds these, leading to the asymptotically optimal performance. We also conduct an extensive experimental analysis. On four classic benchmark problems, we clearly observe the critical influence of the population size on the performance, and we find that the smart-restart scheme leads to a performance close to the one obtainable with optimal parameter values. Our results also show that previous theory-based suggestions for the optimal population size can be far from the optimal ones, leading to a performance clearly inferior to the one obtained via the smart-restart scheme. We also conduct experiments with PBIL (cross-entropy algorithm) on two combinatorial optimization problems from the literature, the max-cut problem and the bipartition problem. Again, we observe that the smart-restart mechanism finds much better values for the population size than those suggested in the literature, leading to a much better performance.

CVAug 3, 2024
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

Weijie Zheng, Xingjun Ma, Hanxun Huang et al.

With the advancement of vision transformers (ViTs) and self-supervised learning (SSL) techniques, pre-trained large ViTs have become the new foundation models for computer vision applications. However, studies have shown that, like convolutional neural networks (CNNs), ViTs are also susceptible to adversarial attacks, where subtle perturbations in the input can fool the model into making false predictions. This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks. We focus on \emph{sample-wise} transfer attacks and propose a novel attack method termed \emph{Downstream Transfer Attack (DTA)}. For a given test image, DTA leverages a pre-trained ViT model to craft the adversarial example and then applies the adversarial example to attack a fine-tuned version of the model on a downstream dataset. During the attack, DTA identifies and exploits the most vulnerable layers of the pre-trained model guided by a cosine similarity loss to craft highly transferable attacks. Through extensive experiments with pre-trained ViTs by 3 distinct pre-training methods, 3 fine-tuning schemes, and across 10 diverse downstream datasets, we show that DTA achieves an average attack success rate (ASR) exceeding 90\%, surpassing existing methods by a huge margin. When used with adversarial training, the adversarial examples generated by our DTA can significantly improve the model's robustness to different downstream transfer attacks.

41.5NEApr 29
Proven Advantage of Multiobjective Evolutionary Algorithms for Problems with Different Degrees of Conflict

Weijie Zheng

The field of multiobjective evolutionary algorithms (MOEAs) often emphasizes its popularity for optimization problems with conflicting objectives. However, it is still theoretically unknown how MOEAs perform compared with typical approaches outside this field. This paper conducts such a systematic theoretical comparison on problem classes with different degrees of conflict. With OneMaxMin$_k$ depicting $k\in[0..n]$ degrees of conflict, we show the difficulties of two typical non-MOEA approaches, the scalarization (weighted-sum) and {the} $ε-$constraint approach. We prove that for any set of weights, the set of optima formed by {the} scalarization approach cannot cover its full Pareto front for $k>2$. Although constrained problems constructed from $ε-$constraint approach ensure the full coverage, general ways (via exterior or nonparameter penalty functions) to solve these constrained problems encounter difficulties. The nonparameter penalty function way cannot guarantee the full coverage, and the exterior way covers the Pareto front with expected $O(\max\{k,1\}n\ln n)$ number of function evaluations, but only with careful settings of $ε$ and $r$ ($r>1/(ε+1-\lceil ε\rceil)$). In contrast, MOEAs efficiently solve OneMaxMin$_k$ without careful designs. We prove the same expected runtime of $O(\max\{k,1\}n\ln n)$ for the (G)SEMO, MOEA/D, NSGA-II, and SMS-EMOA. Our brief discussions on a bi-objective LeadingOnes variant with different degrees of conflict show similar findings.

NEJul 25, 2024
A Crowding Distance That Provably Solves the Difficulties of the NSGA-II in Many-Objective Optimization

Weijie Zheng, Yao Gao, Benjamin Doerr

Recent theoretical works have shown that the NSGA-II can have enormous difficulties to solve problems with more than two objectives. In contrast, algorithms like the NSGA-III or SMS-EMOA, differing from the NSGA-II only in the secondary selection criterion, provably perform well in these situations. To remedy this shortcoming of the NSGA-II, but at the same time keep the advantages of the widely accepted crowding distance, we use the insights of these previous work to define a variant of the crowding distance, called truthful crowding distance. Different from the classic crowding distance, it has for any number of objectives the desirable property that a small crowding distance value indicates that some other solution has a similar objective vector. Building on this property, we conduct mathematical runtime analyses for the NSGA-II with truthful crowding distance. We show that this algorithm can solve the many-objective versions of the OneMinMax, COCZ, LOTZ, and OJZJ$_k$ problems in the same (polynomial) asymptotic runtimes as the NSGA-III and the SMS-EMOA. This contrasts the exponential lower bounds previously shown for the classic NSGA-II. For the bi-objective versions of these problems, our NSGA-II has a similar performance as the classic NSGA-II, gaining however from smaller admissible population sizes. For the bi-objective OneMinMax problem, we also observe a (minimally) better performance in approximating the Pareto front. These results suggest that our truthful version of the NSGA-II has the same good performance as the classic NSGA-II in two objectives, but can resolve the drastic problems in more than two objectives.

47.8NEMay 14
First Mathematical Runtime Analyses of Multi-Objective Evolutionary Algorithms for Multi-Valued Decision Variables

Mingfeng Li, Zheng Cheng, Weijie Zheng et al.

Problems defined on binary decision spaces have been intensively studied in the theory of multi-objective evolutionary algorithms (MOEAs). In contrast, no mathematical runtime analyses exist so far for MOEAs dealing with decision variables that take a finite number $r > 2$ of values, despite the prevalence of such problems in practice. In this work, we begin to fill this research gap. We analyze how the classic SEMO algorithm with unit-strength local mutation computes the Pareto front of an $r$-valued counterpart of the classic \oneminmax benchmark. For the expected number of function evaluations until the Pareto front is covered by the population of this MOEA, we prove an upper bound of $O(n^2 r^2 \log n)$ and a near-tight lower bound of $Ω(n^2 r (r + \log n))$. We can close the small remaining gap between these two bounds by considering a variant of the algorithm that accepts only strictly better solutions; for this variant, we show an upper bound of $O(n^2 r (r + \log n))$, matching our lower bound (which also holds for this variant). Our results suggest that classic MOEAs encounter no significant additional difficulties when dealing with multi-valued decision variables. However, significantly more advanced tools may be required to obtain tight bounds for algorithms with more complex population dynamics.

NEMay 3, 2025
Scalable Speed-ups for the SMS-EMOA from a Simple Aging Strategy

Mingfeng Li, Weijie Zheng, Benjamin Doerr

Different from single-objective evolutionary algorithms, where non-elitism is an established concept, multi-objective evolutionary algorithms almost always select the next population in a greedy fashion. In the only notable exception, Bian, Zhou, Li, and Qian (IJCAI 2023) proposed a stochastic selection mechanism for the SMS-EMOA and proved that it can speed up computing the Pareto front of the bi-objective jump benchmark with problem size $n$ and gap parameter $k$ by a factor of $\max\{1,2^{k/4}/n\}$. While this constitutes the first proven speed-up from non-elitist selection, suggesting a very interesting research direction, it has to be noted that a true speed-up only occurs for $k \ge 4\log_2(n)$, where the runtime is super-polynomial, and that the advantage reduces for larger numbers of objectives as shown in a later work. In this work, we propose a different non-elitist selection mechanism based on aging, which exempts individuals younger than a certain age from a possible removal. This remedies the two shortcomings of stochastic selection: We prove a speed-up by a factor of $\max\{1,Θ(k)^{k-1}\}$, regardless of the number of objectives. In particular, a positive speed-up can already be observed for constant $k$, the only setting for which polynomial runtimes can be witnessed. Overall, this result supports the use of non-elitist selection schemes, but suggests that aging-based mechanisms can be considerably more powerful than stochastic selection mechanisms.

LGNov 20, 2024
Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks

Yong Xie, Weijie Zheng, Hanxun Huang et al.

As deep learning models are increasingly deployed in safety-critical applications, evaluating their vulnerabilities to adversarial perturbations is essential for ensuring their reliability and trustworthiness. Over the past decade, a large number of white-box adversarial robustness evaluation methods (i.e., attacks) have been proposed, ranging from single-step to multi-step methods and from individual to ensemble methods. Despite these advances, challenges remain in conducting meaningful and comprehensive robustness evaluations, particularly when it comes to large-scale testing and ensuring evaluations reflect real-world adversarial risks. In this work, we focus on image classification models and propose a novel individual attack method, Probability Margin Attack (PMA), which defines the adversarial margin in the probability space rather than the logits space. We analyze the relationship between PMA and existing cross-entropy or logits-margin-based attacks, and show that PMA can outperform the current state-of-the-art individual methods. Building on PMA, we propose two types of ensemble attacks that balance effectiveness and efficiency. Furthermore, we create a million-scale dataset, CC1M, derived from the existing CC3M dataset, and use it to conduct the first million-scale white-box adversarial robustness evaluation of adversarially-trained ImageNet models. Our findings provide valuable insights into the robustness gaps between individual versus ensemble attacks and small-scale versus million-scale evaluations.

CVMar 5
UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark

Yanlin Li, Minghui Guo, Kaiwen Zhang et al.

In real-world multimodal applications, systems usually need to comprehend arbitrarily combined and interleaved multimodal inputs from users, while also generating outputs in any interleaved multimedia form. This capability defines the goal of any-to-any interleaved multimodal learning under a unified paradigm of understanding and generation, posing new challenges and opportunities for advancing Multimodal Large Language Models (MLLMs). To foster and benchmark this capability, this paper introduces the UniM benchmark, the first Unified Any-to-Any Interleaved Multimodal dataset. UniM contains 31K high-quality instances across 30 domains and 7 representative modalities: text, image, audio, video, document, code, and 3D, each requiring multiple intertwined reasoning and generation capabilities. We further introduce the UniM Evaluation Suite, which assesses models along three dimensions: Semantic Correctness & Generation Quality, Response Structure Integrity, and Interleaved Coherence. In addition, we propose UniMA, an agentic baseline model equipped with traceable reasoning for structured interleaved generation. Comprehensive experiments demonstrate the difficulty of UniM and highlight key challenges and directions for advancing unified any-to-any multimodal intelligence. The project page is https://any2any-mllm.github.io/unim.

NEDec 16, 2021
Mathematical Runtime Analysis for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II)

Weijie Zheng, Benjamin Doerr

The non-dominated sorting genetic algorithm II (NSGA-II) is the most intensively used multi-objective evolutionary algorithm (MOEA) in real-world applications. However, in contrast to several simple MOEAs analyzed also via mathematical means, no such study exists for the NSGA-II so far. In this work, we show that mathematical runtime analyses are feasible also for the NSGA-II. As particular results, we prove that with a population size four times larger than the size of the Pareto front, the NSGA-II with two classic mutation operators and four different ways to select the parents satisfies the same asymptotic runtime guarantees as the SEMO and GSEMO algorithms on the basic OneMinMax and LeadingOnesTrailingZeros benchmarks. However, if the population size is only equal to the size of the Pareto front, then the NSGA-II cannot efficiently compute the full Pareto front: for an exponential number of iterations, the population will always miss a constant fraction of the Pareto front. Our experiments confirm the above findings.

NESep 14, 2021
Choosing the Right Algorithm With Hints From Complexity Theory

Shouda Wang, Weijie Zheng, Benjamin Doerr

Choosing a suitable algorithm from the myriads of different search heuristics is difficult when faced with a novel optimization problem. In this work, we argue that the purely academic question of what could be the best possible algorithm in a certain broad class of black-box optimizers can give fruitful indications in which direction to search for good established optimization heuristics. We demonstrate this approach on the recently proposed DLB benchmark, for which the only known results are $O(n^3)$ runtimes for several classic evolutionary algorithms and an $O(n^2 \log n)$ runtime for an estimation-of-distribution algorithm. Our finding that the unary unbiased black-box complexity is only $O(n^2)$ suggests the Metropolis algorithm as an interesting candidate and we prove that it solves the DLB problem in quadratic time. Since we also prove that better runtimes cannot be obtained in the class of unary unbiased algorithms, we shift our attention to algorithms that use the information of more parents to generate new solutions. An artificial algorithm of this type having an $O(n \log n)$ runtime leads to the result that the significance-based compact genetic algorithm (sig-cGA) can solve the DLB problem also in time $O(n \log n)$ with high probability. Our experiments show a remarkably good performance of the Metropolis algorithm, clearly the best of all algorithms regarded for reasonable problem sizes.

NEApr 14, 2021
When Non-Elitism Meets Time-Linkage Problems

Weijie Zheng, Qiaozhi Zhang, Huanhuan Chen et al.

Many real-world applications have the time-linkage property, and the only theoretical analysis is recently given by Zheng, et al. (TEVC 2021) on their proposed time-linkage OneMax problem, OneMax$_{(0,1^n)}$. However, only two elitist algorithms (1+1)EA and ($μ$+1)EA are analyzed, and it is unknown whether the non-elitism mechanism could help to escape the local optima existed in OneMax$_{(0,1^n)}$. In general, there are few theoretical results on the benefits of the non-elitism in evolutionary algorithms. In this work, we analyze on the influence of the non-elitism via comparing the performance of the elitist (1+$λ$)EA and its non-elitist counterpart (1,$λ$)EA. We prove that with probability $1-o(1)$ (1+$λ$)EA will get stuck in the local optima and cannot find the global optimum, but with probability $1$, (1,$λ$)EA can reach the global optimum and its expected runtime is $O(n^{3+c}\log n)$ with $λ=c \log_{\frac{e}{e-1}} n$ for the constant $c\ge 1$. Noting that a smaller offspring size is helpful for escaping from the local optima, we further resort to the compact genetic algorithm where only two individuals are sampled to update the probabilistic model, and prove its expected runtime of $O(n^3\log n)$. Our computational experiments also verify the efficiency of the two non-elitist algorithms.

NEDec 14, 2020
Theoretical Analyses of Multiobjective Evolutionary Algorithms on Multimodal Objectives

Weijie Zheng, Benjamin Doerr

The theoretical understanding of MOEAs is lagging far behind their success in practice. In particular, previous theory work considers mostly easy problems that are composed of unimodal objectives. As a first step towards a deeper understanding of how evolutionary algorithms solve multimodal multiobjective problems, we propose the OJZJ problem, a bi-objective problem composed of two objectives isomorphic to the classic jump function benchmark. We prove that SEMO with probability one does not compute the full Pareto front, regardless of the runtime. In contrast, for all problem sizes $n$ and all jump sizes ${k \in [4..\frac n2 - 1]}$, the global SEMO (GSEMO) covers the Pareto front in an expected number of $Θ((n-2k)n^{k})$ iterations. For $k = o(n)$, we also show the tighter bound $\frac 32 e n^{k+1} \pm o(n^{k+1})$, which might be the first runtime bound for an MOEA that is tight apart from lower-order terms. We also combine the GSEMO with two approaches that showed advantages in single-objective multimodal problems. When using the GSEMO with a heavy-tailed mutation operator, the expected runtime improves by a factor of at least $k^{Ω(k)}$. When adapting the recent stagnation-detection strategy of Rajabi and Witt (2022) to the GSEMO, the expected runtime also improves by a factor of at least $k^{Ω(k)}$ and surpasses the heavy-tailed GSEMO by a small polynomial factor in $k$. Via an experimental analysis, we show that these asymptotic differences are visible already for small problem sizes: A factor-$5$ speed-up from heavy-tailed mutation and a factor-$10$ speed-up from stagnation detection can be observed already for jump size~$4$ and problem sizes between $10$ and $50$. Overall, our results show that the ideas recently developed to aid single-objective evolutionary algorithms to cope with local optima can be effectively employed also in multiobjective optimization.

NEApr 26, 2020
Analysis of Evolutionary Algorithms on Fitness Function with Time-linkage Property

Weijie Zheng, Huanhuan Chen, Xin Yao

In real-world applications, many optimization problems have the time-linkage property, that is, the objective function value relies on the current solution as well as the historical solutions. Although the rigorous theoretical analysis on evolutionary algorithms has rapidly developed in recent two decades, it remains an open problem to theoretically understand the behaviors of evolutionary algorithms on time-linkage problems. This paper takes the first step to rigorously analyze evolutionary algorithms for time-linkage functions. Based on the basic OneMax function, we propose a time-linkage function where the first bit value of the last time step is integrated but has a different preference from the current first bit. We prove that with probability $1-o(1)$, randomized local search and $(1+1)$ EA cannot find the optimum, and with probability $1-o(1)$, $(μ+1)$ EA is able to reach the optimum.

NEApr 15, 2020
From Understanding Genetic Drift to a Smart-Restart Parameter-less Compact Genetic Algorithm

Benjamin Doerr, Weijie Zheng

One of the key difficulties in using estimation-of-distribution algorithms is choosing the population size(s) appropriately: Too small values lead to genetic drift, which can cause enormous difficulties. In the regime with no genetic drift, however, often the runtime is roughly proportional to the population size, which renders large population sizes inefficient. Based on a recent quantitative analysis which population sizes lead to genetic drift, we propose a parameter-less version of the compact genetic algorithm that automatically finds a suitable population size without spending too much time in situations unfavorable due to genetic drift. We prove a mathematical runtime guarantee for this algorithm and conduct an extensive experimental analysis on four classic benchmark problems both without and with additive centered Gaussian posterior noise. The former shows that under a natural assumption, our algorithm has a performance very similar to the one obtainable from the best problem-specific population size. The latter confirms that missing the right population size in the original cGA can be detrimental and that previous theory-based suggestions for the population size can be far away from the right values; it also shows that our algorithm as well as a previously proposed parameter-less variant of the cGA based on parallel runs avoid such pitfalls. Comparing the two parameter-less approaches, ours profits from its ability to abort runs which are likely to be stuck in a genetic drift situation.

NEOct 31, 2019
Sharp Bounds for Genetic Drift in Estimation of Distribution Algorithms

Benjamin Doerr, Weijie Zheng

Estimation of Distribution Algorithms (EDAs) are one branch of Evolutionary Algorithms (EAs) in the broad sense that they evolve a probabilistic model instead of a population. Many existing algorithms fall into this category. Analogous to genetic drift in EAs, EDAs also encounter the phenomenon that updates of the probabilistic model not justified by the fitness move the sampling frequencies to the boundary values. This can result in a considerable performance loss. This paper proves the first sharp estimates of the boundary hitting time of the sampling frequency of a neutral bit for several univariate EDAs. For the UMDA that selects $μ$ best individuals from $λ$ offspring each generation, we prove that the expected first iteration when the frequency of the neutral bit leaves the middle range $[\tfrac 14, \tfrac 34]$ and the expected first time it is absorbed in 0 or 1 are both $Θ(μ)$. The corresponding hitting times are $Θ(K^2)$ for the cGA with hypothetical population size $K$. This paper further proves that for PBIL with parameters $μ$, $λ$, and $ρ$, in an expected number of $Θ(μ/ρ^2)$ iterations the sampling frequency of a neutral bit leaves the interval $[Θ(ρ/μ),1-Θ(ρ/μ)]$ and then always the same value is sampled for this bit, that is, the frequency approaches the corresponding boundary value with maximum speed. For the lower bounds implicit in these statements, we also show exponential tail bounds. If a bit is not neutral, but neutral or has a preference for ones, then the lower bounds on the times to reach a low frequency value still hold. An analogous statement holds for bits that are neutral or prefer the value zero.

NEDec 9, 2018
Working Principles of Binary Differential Evolution

Benjamin Doerr, Weijie Zheng

We conduct a first fundamental analysis of the working principles of binary differential evolution (BDE), an optimization heuristic for binary decision variables that was derived by Gong and Tuson (2007) from the very successful classic differential evolution (DE) for continuous optimization. We show that unlike most other optimization paradigms, it is stable in the sense that neutral bit values are sampled with probability close to $1/2$ for a long time. This is generally a desirable property, however, it makes it harder to find the optima for decision variables with small influence on the objective function. This can result in an optimization time exponential in the dimension when optimizing simple symmetric functions like OneMax. On the positive side, BDE quickly detects and optimizes the most important decision variables. For example, dominant bits converge to the optimal value in time logarithmic in the population size. This enables BDE to optimize the most important bits very fast. Overall, our results indicate that BDE is an interesting optimization paradigm having characteristics significantly different from classic evolutionary algorithms or estimation-of-distribution algorithms (EDAs). On the technical side, we observe that the strong stochastic dependencies in the random experiment describing a run of BDE prevent us from proving all desired results with the mathematical rigor that was successfully used in the analysis of other evolutionary algorithms. Inspired by mean-field approaches in statistical physics we propose a more independent variant of BDE, show experimentally its similarity to BDE, and prove some statements rigorously only for the independent variant. Such a semi-rigorous approach might be interesting for other problems in evolutionary computation where purely mathematical methods failed so far.