Johannes Lengler

NE
h-index45
19papers
724citations
Novelty48%
AI Score46

19 Papers

LGJan 24, 2025
Humanity's Last Exam

Long Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.

PRApr 29
Degree-dependent and distance-dependent contact rates interpolate between explosive, exponential and polynomial epidemic growth

Zylan Benjert, Júlia Komjáthy, Johannes Lengler et al.

It is a fundamental question in epidemiology to estimate, model and predict the growth rate of a pandemic. Analogously, analysing the diffusion of innovation, (fake) news, memes, and rumours is of key importance in the social sciences. The resulting epidemic growth curves can be classified according to their growth rates. These have been found to range from exponential to both faster super-exponential curves and slower subexponential or polynomial curves. Previous research has lacked a unified explanatory framework capable of accommodating super-exponential, (stretched) exponential, and polynomial growth patterns within the same contact network. In this paper we propose a simple agent-based network model that can capture all these phases. We provide such a framework by modelling how transmission rates depend on spatial distance and on individuals' numbers of contacts. By comparing the growth rate of spreading processes with or without degree-dependent and/or distance-dependent contact rates through data-driven and synthetic simulations on real and modelled networks with underlying geometry, we find evidence that even a 'sublinear presence' of these causes may cause a significant slow down of the growth rate on the same underlying network. We find that the growth rate is governed by a combination of three factors: geometry, the prevalence of weak ties, and superspreaders. We confirm our results with rigorous proofs in a theoretical model, using a spatial multiscale-argument in long-range heterogeneous first passage percolation. Our results give a plausible explanation of why the consecutive waves of a single pandemic can differ in their growth even if their spreading mechanisms are similar.

SIMar 10
Stable Boundaries of Opinion Dynamics in Heterogeneous Spatial Complex Networks

Mats Bierwirth, Johannes Lengler

We investigate majority-vote opinion dynamics on Geometric Inhomogeneous Random Graphs (GIRGs), a powerful model for spatial complex networks. In contrast to classic coarsening dynamics where a single opinion typically achieves global consensus, our simulations reveal that sufficiently large, localized opinion domains do not disappear. Instead, they stabilize, leading to a persistent coexistence of competing opinions. To understand the mechanism behind this arrested coarsening, we develop and analyze a tractable mean-field model of the interface between two opinion domains. Our main theoretical result rigorously establishes the existence of a stable, non-trivial limiting distribution for the interface profile in a mean-field analysis. This demonstrates that the boundary between opinions is stationary, providing a mathematical explanation for how complex network geometry can support robust opinion diversity in social systems.

NEApr 15, 2024
Plus Strategies are Exponentially Slower for Planted Optima of Random Height

Johannes Lengler, Leon Schiller, Oliver Sieberling

We compare the $(1,λ)$-EA and the $(1 + λ)$-EA on the recently introduced benchmark DisOM, which is the OneMax function with randomly planted local optima. Previous work showed that if all local optima have the same relative height, then the plus strategy never loses more than a factor $O(n\log n)$ compared to the comma strategy. Here we show that even small random fluctuations in the heights of the local optima have a devastating effect for the plus strategy and lead to super-polynomial runtimes. On the other hand, due to their ability to escape local optima, comma strategies are unaffected by the height of the local optima and remain efficient. Our results hold for a broad class of possible distortions and show that the plus strategy, but not the comma strategy, is generally deceived by sparse unstructured fluctuations of a smooth landscape.

NEOct 26, 2020
Runtime analysis of the (mu+1)-EA on the Dynamic BinVal function

Johannes Lengler, Simone Riedi

We study evolutionary algorithms in a dynamic setting, where for each generation a different fitness function is chosen, and selection is performed with respect to the current fitness function. Specifically, we consider Dynamic BinVal, in which the fitness functions for each generation is given by the linear function BinVal, but in each generation the order of bits is randomly permuted. For the (1+1)-EA it was known that there is an efficiency threshold $c_0$ for the mutation parameter, at which the runtime switches from quasilinear to exponential. A previous empirical evidence suggested that for larger population size $μ$, the threshold may increase. We prove that this is at least the case in an $\varepsilon$-neighborhood around the optimum: the threshold of the (μ+1)-EA becomes arbitrarily large if the $μ$ is chosen large enough. However, the most surprising result is obtained by a second order analysis for $μ=2$: the threshold INcreases with increasing proximity to the optimum. In particular, the hardest region for optimization is NOT around the optimum.

NEApr 21, 2020
Large Population Sizes and Crossover Help in Dynamic Environments

Johannes Lengler, Jonas Meier

Dynamic linear functions on the hypercube are functions which assign to each bit a positive weight, but the weights change over time. Throughout optimization, these functions maintain the same global optimum, and never have defecting local optima. Nevertheless, it was recently shown [Lengler, Schaller, FOCI 2019] that the $(1+1)$-Evolutionary Algorithm needs exponential time to find or approximate the optimum for some algorithm configurations. In this paper, we study the effect of larger population sizes for Dynamic BinVal, the extremal form of dynamic linear functions. We find that moderately increased population sizes extend the range of efficient algorithm configurations, and that crossover boosts this positive effect substantially. Remarkably, similar to the static setting of monotone functions in [Lengler, Zou, FOGA 2019], the hardest region of optimization for $(μ+1)$-EA is not close the optimum, but far away from it. In contrast, for the $(μ+1)$-GA, the region around the optimum is the hardest region in all studied cases.

NEJul 30, 2019
Exponential Slowdown for Larger Populations: The $(μ+1)$-EA on Monotone Functions

Johannes Lengler, Xun Zou

Pseudo-Boolean monotone functions are unimodal functions which are trivial to optimize for some hillclimbers, but are challenging for a surprising number of evolutionary algorithms (EAs). A general trend is that EAs are efficient if parameters like the mutation rate are set conservatively, but may need exponential time otherwise. In particular, it was known that the $(1+1)$-EA and the $(1+λ)$-EA can optimize every monotone function in pseudolinear time if the mutation rate is $c/n$ for some $c<1$, but they need exponential time for some monotone functions for $c>2.2$. The second part of the statement was also known for the $(μ+1)$-EA. In this paper we show that the first statement does not apply to the $(μ+1)$-EA. More precisely, we prove that for every constant $c>0$ there is a constant integer $μ_0$ such that the $(μ+1)$-EA with mutation rate $c/n$ and population size $μ_0\leμ\le n$ needs superpolynomial time to optimize some monotone functions. Thus, increasing the population size by just a constant has devastating effects on the performance. This is in stark contrast to many other benchmark functions on which increasing the population size either increases the performance significantly, or affects performance mildly. The reason why larger populations are harmful lies in the fact that larger populations may temporarily decrease selective pressure on parts of the population. This allows unfavorable mutations to accumulate in single individuals and their descendants. If the population moves sufficiently fast through the search space, such unfavorable descendants can become ancestors of future generations, and the bad mutations are preserved. Remarkably, this effect only occurs if the population renews itself sufficiently fast, which can only happen far away from the optimum. This is counter-intuitive since usually optimization gets harder as we approach the optimum.

NEFeb 7, 2019
Self-Adjusting Mutation Rates with Provably Optimal Success Rules

Benjamin Doerr, Carola Doerr, Johannes Lengler

The one-fifth success rule is one of the best-known and most widely accepted techniques to control the parameters of evolutionary algorithms. While it is often applied in the literal sense, a common interpretation sees the one-fifth success rule as a family of success-based updated rules that are determined by an update strength $F$ and a success rate. We analyze in this work how the performance of the (1+1) Evolutionary Algorithm on LeadingOnes depends on these two hyper-parameters. Our main result shows that the best performance is obtained for small update strengths $F=1+o(1)$ and success rate $1/e$. We also prove that the running time obtained by this parameter setting is, apart from lower order terms, the same that is achieved with the best fitness-dependent mutation rate. We show similar results for the resampling variant of the (1+1) Evolutionary Algorithm, which enforces to flip at least one bit per iteration.

NEAug 16, 2018
The linear hidden subset problem for the (1+1) EA with scheduled and adaptive mutation rates

Hafsteinn Einarsson, Marcelo Matheus Gauy, Johannes Lengler et al.

We study unbiased $(1+1)$ evolutionary algorithms on linear functions with an unknown number $n$ of bits with non-zero weight. Static algorithms achieve an optimal runtime of $O(n (\ln n)^{2+ε})$, however, it remained unclear whether more dynamic parameter policies could yield better runtime guarantees. We consider two setups: one where the mutation rate follows a fixed schedule, and one where it may be adapted depending on the history of the run. For the first setup, we give a schedule that achieves a runtime of $(1\pm o(1))βn \ln n$, where $β\approx 3.552$, which is an asymptotic improvement over the runtime of the static setup. Moreover, we show that no schedule admits a better runtime guarantee and that the optimal schedule is essentially unique. For the second setup, we show that the runtime can be further improved to $(1\pm o(1)) e n \ln n$, which matches the performance of algorithms that know $n$ in advance. Finally, we study the related model of initial segment uncertainty with static position-dependent mutation rates, and derive asymptotically optimal lower bounds. This answers a question by Doerr, Doerr, and Kötzing.

PRAug 3, 2018
When Does Hillclimbing Fail on Monotone Functions: An entropy compression argument

Johannes Lengler, Anders Martinsson, Angelika Steger

Hillclimbing is an essential part of any optimization algorithm. An important benchmark for hillclimbing algorithms on pseudo-Boolean functions $f: \{0,1\}^n \to \mathbb{R}$ are (strictly) montone functions, on which a surprising number of hillclimbers fail to be efficient. For example, the $(1+1)$-Evolutionary Algorithm is a standard hillclimber which flips each bit independently with probability $c/n$ in each round. Perhaps surprisingly, this algorithm shows a phase transition: it optimizes any monotone pseudo-boolean function in quasilinear time if $c<1$, but there are monotone functions for which the algorithm needs exponential time if $c>2.2$. But so far it was unclear whether the threshold is at $c=1$. In this paper we show how Moser's entropy compression argument can be adapted to this situation, that is, we show that a long runtime would allow us to encode the random steps of the algorithm with less bits than their entropy. Thus there exists a $c_0 > 1$ such that for all $0<c\le c_0$ the $(1+1)$-Evolutionary Algorithm with rate $c/n$ finds the optimum in $O(n \log^2 n)$ steps in expectation.

NEJun 6, 2018
Bounding Bloat in Genetic Programming

Benjamin Doerr, Timo Kötzing, J. A. Gregor Lagodzinski et al.

While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat (unnecessary growth of solutions) slowing down optimization. Theoretical analyses could so far not bound bloat and required explicit assumptions on the magnitude of bloat. In this paper we analyze bloat in mutation-based genetic programming for the two test functions ORDER and MAJORITY. We overcome previous assumptions on the magnitude of bloat and give matching or close-to-matching upper and lower bounds for the expected optimization time. In particular, we show that the (1+1) GP takes (i) $Θ(T_{init} + n \log n)$ iterations with bloat control on ORDER as well as MAJORITY; and (ii) $O(T_{init} \log T_{init} + n (\log n)^3)$ and $Ω(T_{init} + n \log n)$ (and $Ω(T_{init} \log T_{init})$ for $n=1$) iterations without bloat control on MAJORITY.

NEMay 25, 2018
Destructiveness of Lexicographic Parsimony Pressure and Alleviation by a Concatenation Crossover in Genetic Programming

Timo Kötzing, J. A. Gregor Lagodzinski, Johannes Lengler et al.

For theoretical analyses there are two specifics distinguishing GP from many other areas of evolutionary computation. First, the variable size representations, in particular yielding a possible bloat (i.e. the growth of individuals with redundant parts). Second, the role and realization of crossover, which is particularly central in GP due to the tree-based representation. Whereas some theoretical work on GP has studied the effects of bloat, crossover had a surprisingly little share in this work. We analyze a simple crossover operator in combination with local search, where a preference for small solutions minimizes bloat (lexicographic parsimony pressure); the resulting algorithm is denoted Concatenation Crossover GP. For this purpose three variants of the well-studied MAJORITY test function with large plateaus are considered. We show that the Concatenation Crossover GP can efficiently optimize these test functions, while local search cannot be efficient for all three variants independent of employing bloat control.

NEMar 25, 2018
A General Dichotomy of Evolutionary Algorithms on Monotone Functions

Johannes Lengler

It is known that the evolutionary algorithm $(1+1)$-EA with mutation rate $c/n$ optimises every monotone function efficiently if $c<1$, and needs exponential time on some monotone functions (HotTopic functions) if $c\geq 2.2$. We study the same question for a large variety of algorithms, particularly for $(1+λ)$-EA, $(μ+1)$-EA, $(μ+1)$-GA, their fast counterparts like fast $(1+1)$-EA, and for $(1+(λ,λ))$-GA. We find that all considered mutation-based algorithms show a similar dichotomy for HotTopic functions, or even for all monotone functions. For the $(1+(λ,λ))$-GA, this dichotomy is in the parameter $cγ$, which is the expected number of bit flips in an individual after mutation and crossover, neglecting selection. For the fast algorithms, the dichotomy is in $m_2/m_1$, where $m_1$ and $m_2$ are the first and second falling moment of the number of bit flips. Surprisingly, the range of efficient parameters is not affected by either population size $μ$ nor by the offspring population size $λ$. The picture changes completely if crossover is allowed. The genetic algorithms $(μ+1)$-GA and fast $(μ+1)$-GA are efficient for arbitrary mutations strengths if $μ$ is large enough.

NEMar 12, 2018
Sorting by Swaps with Noisy Comparisons

Tomáš Gavenčiak, Barbara Geissmann, Johannes Lengler

We study sorting of permutations by random swaps if each comparison gives the wrong result with some fixed probability $p<1/2$. We use this process as prototype for the behaviour of randomized, comparison-based optimization heuristics in the presence of noisy comparisons. As quality measure, we compute the expected fitness of the stationary distribution. To measure the runtime, we compute the minimal number of steps after which the average fitness approximates the expected fitness of the stationary distribution. We study the process where in each round a random pair of elements at distance at most $r$ are compared. We give theoretical results for the extreme cases $r=1$ and $r=n$, and experimental results for the intermediate cases. We find a trade-off between faster convergence (for large $r$) and better quality of the solution after convergence (for small $r$).

NEDec 4, 2017
Drift Analysis

Johannes Lengler

Drift analysis is one of the major tools for analysing evolutionary algorithms and nature-inspired search heuristics. In this chapter we give an introduction to drift analysis and give some examples of how to use it for the analysis of evolutionary algorithms.

COAug 10, 2016
Drift Analysis and Evolutionary Algorithms Revisited

Johannes Lengler, Angelika Steger

One of the easiest randomized greedy optimization algorithms is the following evolutionary algorithm which aims at maximizing a boolean function $f:\{0,1\}^n \to {\mathbb R}$. The algorithm starts with a random search point $ξ\in \{0,1\}^n$, and in each round it flips each bit of $ξ$ with probability $c/n$ independently at random, where $c>0$ is a fixed constant. The thus created offspring $ξ'$ replaces $ξ$ if and only if $f(ξ') \ge f(ξ)$. The analysis of the runtime of this simple algorithm on monotone and on linear functions turned out to be highly non-trivial. In this paper we review known results and provide new and self-contained proofs of partly stronger results.

NEApr 8, 2016
The (1+1) Elitist Black-Box Complexity of LeadingOnes

Carola Doerr, Johannes Lengler

One important goal of black-box complexity theory is the development of complexity models allowing to derive meaningful lower bounds for whole classes of randomized search heuristics. Complementing classical runtime analysis, black-box models help us understand how algorithmic choices such as the population size, the variation operators, or the selection rules influence the optimization time. One example for such a result is the $Ω(n \log n)$ lower bound for unary unbiased algorithms on functions with a unique global optimum [Lehre/Witt, GECCO 2010], which tells us that higher arity operators or biased sampling strategies are needed when trying to beat this bound. In lack of analyzing techniques, almost no non-trivial bounds are known for other restricted models. Proving such bounds therefore remains to be one of the main challenges in black-box complexity theory. With this paper we contribute to our technical toolbox for lower bound computations by proposing a new type of information-theoretic argument. We regard the permutation- and bit-invariant version of \textsc{LeadingOnes} and prove that its (1+1) elitist black-box complexity is $Ω(n^2)$, a bound that is matched by (1+1)-type evolutionary algorithms. The (1+1) elitist complexity of \textsc{LeadingOnes} is thus considerably larger than its unrestricted one, which is known to be of order $n\log\log n$ [Afshani et al., 2013].

NEAug 27, 2015
Introducing Elitist Black-Box Models: When Does Elitist Selection Weaken the Performance of Evolutionary Algorithms?

Carola Doerr, Johannes Lengler

Black-box complexity theory provides lower bounds for the runtime of black-box optimizers like evolutionary algorithms and serves as an inspiration for the design of new genetic algorithms. Several black-box models covering different classes of algorithms exist, each highlighting a different aspect of the algorithms under considerations. In this work we add to the existing black-box notions a new \emph{elitist black-box model}, in which algorithms are required to base all decisions solely on (a fixed number of) the best search points sampled so far. Our model combines features of the ranking-based and the memory-restricted black-box models with elitist selection. We provide several examples for which the elitist black-box complexity is exponentially larger than that the respective complexities in all previous black-box models, thus showing that the elitist black-box complexity can be much closer to the runtime of typical evolutionary algorithms. We also introduce the concept of $p$-Monte Carlo black-box complexity, which measures the time it takes to optimize a problem with failure probability at most $p$. Even for small~$p$, the $p$-Monte Carlo black-box complexity of a function class $\mathcal F$ can be smaller by an exponential factor than its typically regarded Las Vegas complexity (which measures the \emph{expected} time it takes to optimize $\mathcal F$).

NEApr 10, 2015
OneMax in Black-Box Models with Several Restrictions

Carola Doerr, Johannes Lengler

Black-box complexity studies lower bounds for the efficiency of general-purpose black-box optimization algorithms such as evolutionary algorithms and other search heuristics. Different models exist, each one being designed to analyze a different aspect of typical heuristics such as the memory size or the variation operators in use. While most of the previous works focus on one particular such aspect, we consider in this work how the combination of several algorithmic restrictions influence the black-box complexity. Our testbed are so-called OneMax functions, a classical set of test functions that is intimately related to classic coin-weighing problems and to the board game Mastermind. We analyze in particular the combined memory-restricted ranking-based black-box complexity of OneMax for different memory sizes. While its isolated memory-restricted as well as its ranking-based black-box complexity for bit strings of length $n$ is only of order $n/\log n$, the combined model does not allow for algorithms being faster than linear in $n$, as can be seen by standard information-theoretic considerations. We show that this linear bound is indeed asymptotically tight. Similar results are obtained for other memory- and offspring-sizes. Our results also apply to the (Monte Carlo) complexity of OneMax in the recently introduced elitist model, in which only the best-so-far solution can be kept in the memory. Finally, we also provide improved lower bounds for the complexity of OneMax in the regarded models. Our result enlivens the quest for natural evolutionary algorithms optimizing OneMax in $o(n \log n)$ iterations.