CVMay 25, 2022
How explainable are adversarially-robust CNNs?Mehdi Nourelahi, Lars Kotthoff, Peijie Chen et al.
Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability. While these criteria have been studied independently, their relationship is unknown. For example, do CNNs that have a stronger out-of-distribution performance have also stronger explainability? Furthermore, most prior feature-importance studies only evaluate methods on 2-3 common vanilla ImageNet-trained CNNs, leaving it unknown how these methods generalize to CNNs of other architectures and training algorithms. Here, we perform the first, large-scale evaluation of the relations of the three criteria using 9 feature-importance methods and 12 ImageNet-trained CNNs that are of 3 training algorithms and 5 CNN architectures. We find several important insights and recommendations for ML practitioners. First, adversarially robust CNNs have a higher explainability score on gradient-based attribution methods (but not CAM-based or perturbation-based methods). Second, AdvProp models, despite being highly accurate more than both vanilla and robust models alone, are not superior in explainability. Third, among 9 feature attribution methods tested, GradCAM and RISE are consistently the best methods. Fourth, Insertion and Deletion are biased towards vanilla and robust models respectively, due to their strong correlation with the confidence score distributions of a CNN. Fifth, we did not find a single CNN to be the best in all three criteria, which interestingly suggests that CNNs are harder to interpret as they become more accurate.
AIDec 18, 2025
Best Practices For Empirical Meta-Algorithmic Research: Guidelines from the COSEAL Research NetworkTheresa Eimer, Lennart Schäpermeier, André Biedenkapp et al.
Empirical research on meta-algorithmics, such as algorithm selection, configuration, and scheduling, often relies on extensive and thus computationally expensive experiments. With the large degree of freedom we have over our experimental setup and design comes a plethora of possible error sources that threaten the scalability and validity of our scientific insights. Best practices for meta-algorithmic research exist, but they are scattered between different publications and fields, and continue to evolve separately from each other. In this report, we collect good practices for empirical meta-algorithmic research across the subfields of the COSEAL community, encompassing the entire experimental cycle: from formulating research questions and selecting an experimental design, to executing experiments, and ultimately, analyzing and presenting results impartially. It establishes the current state-of-the-art practices within meta-algorithmic research and serves as a guideline to both new researchers and practitioners in meta-algorithmic fields.
AIJan 29
LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AINiki van Stein, Anna V. Kononova, Lars Kotthoff et al.
Large language models have enabled automated algorithm design (AAD) by generating optimization algorithms directly from natural-language prompts. While evolutionary frameworks such as LLaMEA demonstrate strong exploratory capabilities across the algorithm design space, their search dynamics are entirely driven by fitness feedback, leaving substantial information about the generated code unused. We propose a mechanism for guiding AAD using feedback constructed from graph-theoretic and complexity features extracted from the abstract syntax trees of the generated algorithms, based on a surrogate model learned over an archive of evaluated solutions. Using explainable AI techniques, we identify features that substantially affect performance and translate them into natural-language mutation instructions that steer subsequent LLM-based code generation without restricting expressivity. We propose LLaMEA-SAGE, which integrates this feature-driven guidance into LLaMEA, and evaluate it across several benchmarks. We show that the proposed structured guidance achieves the same performance faster than vanilla LLaMEA in a small controlled experiment. In a larger-scale experiment using the MA-BBOB suite from the GECCO-MA-BBOB competition, our guided approach achieves superior performance compared to state-of-the-art AAD methods. These results demonstrate that signals derived from code can effectively bias LLM-driven algorithm evolution, bridging the gap between code structure and human-understandable performance feedback in automated algorithm design.
NEMar 20, 2025
Code Evolution Graphs: Understanding Large Language Model Driven Design of AlgorithmsNiki van Stein, Anna V. Kononova, Lars Kotthoff et al.
Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to generate competitive algorithms or the code optimization stalls, and we are left with no recourse because of a lack of understanding of the generation process and generated codes. We present a novel approach to mitigate this problem by enabling users to analyze the generated codes inside the evolutionary process and how they evolve over repeated prompting of the LLM. We show results for three benchmark problem classes and demonstrate novel insights. In particular, LLMs tend to generate more complex code with repeated prompting, but additional complexity can hurt algorithmic performance in some cases. Different LLMs have different coding ``styles'' and generated code tends to be dissimilar to other LLMs. These two findings suggest that using different LLMs inside the code evolution frameworks might produce higher performing code than using only one LLM.
58.5MLMar 31
mlr3mbo: Bayesian Optimization in RMarc Becker, Lennart Schneider, Martin Binder et al.
We present mlr3mbo, a comprehensive and modular toolbox for Bayesian optimization in R. mlr3mbo supports single- and multi-objective optimization, multi-point proposals, batch and asynchronous parallelization, input and output transformations, and robust error handling. While it can be used for many standard Bayesian optimization variants in applied settings, researchers can also construct custom BO algorithms from its flexible building blocks. In addition to an introduction to the software, its design principles, and its building blocks, the paper presents two extensive empirical evaluations of the software on the surrogate-based benchmark suite YAHPO Gym. To identify robust default configurations for both numeric and mixed-hierarchical optimization regimes, and to gain further insights into the respective impacts of individual settings, we run a coordinate descent search over the mlr3mbo configuration space and analyze its results. Furthermore, we demonstrate that mlr3mbo achieves state-of-the-art performance by benchmarking it against a wide range of optimizers, including HEBO, SMAC3, Ax, and Optuna.
LGNov 29, 2021
Automated Benchmark-Driven Design and Explanation of Hyperparameter OptimizersJulia Moosbauer, Martin Binder, Lennart Schneider et al.
Automated hyperparameter optimization (HPO) has gained great popularity and is an important ingredient of most automated machine learning frameworks. The process of designing HPO algorithms, however, is still an unsystematic and manual process: Limitations of prior work are identified and the improvements proposed are -- even though guided by expert knowledge -- still somewhat arbitrary. This rarely allows for gaining a holistic understanding of which algorithmic components are driving performance, and carries the risk of overlooking good algorithmic design choices. We present a principled approach to automated benchmark-driven algorithm design applied to multifidelity HPO (MF-HPO): First, we formalize a rich space of MF-HPO candidates that includes, but is not limited to common HPO algorithms, and then present a configurable framework covering this space. To find the best candidate automatically and systematically, we follow a programming-by-optimization approach and search over the space of algorithm candidates via Bayesian optimization. We challenge whether the found design choices are necessary or could be replaced by more naive and simpler ones by performing an ablation analysis. We observe that using a relatively simple configuration, in some ways simpler than established methods, performs very well as long as some critical configuration parameters have the right value.
MTRL-SCIJul 29, 2021
Bayesian Optimization in Materials Science: A SurveyLars Kotthoff, Hud Wahab, Patrick Johnson
Bayesian optimization is used in many areas of AI for the optimization of black-box processes and has achieved impressive improvements of the state of the art for a lot of applications. It intelligently explores large and complex design spaces while minimizing the number of evaluations of the expensive underlying process to be optimized. Materials science considers the problem of optimizing materials' properties given a large design space that defines how to synthesize or process them, with evaluations requiring expensive experiments or simulations -- a very similar setting. While Bayesian optimization is also a popular approach to tackle such problems, there is almost no overlap between the two communities that are investigating the same concepts. We present a survey of Bayesian optimization approaches in materials science to increase cross-fertilization and avoid duplication of work. We highlight common challenges and opportunities for joint research efforts.
APP-PHJul 29, 2021
Modeling and Optimizing Laser-Induced GrapheneLars Kotthoff, Sourin Dey, Vivek Jain et al.
A lot of technological advances depend on next-generation materials, such as graphene, which enables a raft of new applications, for example better electronics. Manufacturing such materials is often difficult; in particular, producing graphene at scale is an open problem. We provide a series of datasets that describe the optimization of the production of laser-induced graphene, an established manufacturing method that has shown great promise. We pose three challenges based on the datasets we provide -- modeling the behavior of laser-induced graphene production with respect to parameters of the production process, transferring models and knowledge between different precursor materials, and optimizing the outcome of the transformation over the space of possible production parameters. We present illustrative results, along with the code used to generate them, as a starting point for interested users. The data we provide represents an important real-world application of machine learning; to the best of our knowledge, no similar datasets are available.
LGJan 18, 2020
FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural NetworksMd Shahriar Iqbal, Jianhai Su, Lars Kotthoff et al.
The design of machine learning systems often requires trading off different objectives, for example, prediction error and energy consumption for deep neural networks (DNNs). Typically, no single design performs well in all objectives; therefore, finding Pareto-optimal designs is of interest. The search for Pareto-optimal designs involves evaluating designs in an iterative process, and the measurements are used to evaluate an acquisition function that guides the search process. However, measuring different objectives incurs different costs. For example, the cost of measuring the prediction error of DNNs is orders of magnitude higher than that of measuring the energy consumption of a pre-trained DNN, as it requires re-training the DNN. Current state-of-the-art methods do not consider this difference in objective evaluation cost, potentially incurring expensive evaluations of objective functions in the optimization process. In this paper, we develop a novel decoupled and cost-aware multi-objective optimization algorithm, we call Flexible Multi-Objective Bayesian Optimization (FlexiBO) to address this issue. FlexiBO weights the improvement of the hypervolume of the Pareto region by the measurement cost of each objective to balance the expense of collecting new information with the knowledge gained through objective evaluations, preventing us from performing expensive measurements for little to no gain. We evaluate FlexiBO on seven state-of-the-art DNNs for image recognition, natural language processing (NLP), and speech-to-text translation. Our results indicate that, given the same total experimental budget, FlexiBO discovers designs with 4.8$\%$ to 12.4$\%$ lower hypervolume error than the best method in state-of-the-art multi-objective optimization.
LGApr 4, 2019
Transfer Learning for Performance Modeling of Deep Neural Network SystemsMd Shahriar Iqbal, Lars Kotthoff, Pooyan Jamshidi
Modern deep neural network (DNN) systems are highly configurable with large a number of options that significantly affect their non-functional behavior, for example inference time and energy consumption. Performance models allow to understand and predict the effects of such configuration options on system behavior, but are costly to build because of large configuration spaces. Performance models from one environment cannot be transferred directly to another; usually models are rebuilt from scratch for different environments, for example different hardware. Recently, transfer learning methods have been applied to reuse knowledge from performance models trained in one environment in another. In this paper, we perform an empirical study to understand the effectiveness of different transfer learning strategies for building performance models of DNN systems. Our results show that transferring information on the most influential configuration options and their interactions is an effective way of reducing the cost to build performance models in new environments.
AIMay 3, 2018
The Algorithm Selection Competitions 2015 and 2017Marius Lindauer, Jan N. van Rijn, Lars Kotthoff
The algorithm selection problem is to choose the most suitable algorithm for solving a given problem instance. It leverages the complementarity between different approaches that is present in many areas of AI. We report on the state of the art in algorithm selection, as defined by the Algorithm Selection competitions in 2015 and 2017. The results of these competitions show how the state of the art improved over the years. We show that although performance in some cases is very good, there is still room for improvement in other cases. Finally, we provide insights into why some scenarios are hard, and pose challenges to the community on how to advance the current state of the art.
PLJul 11, 2017
Hot-Rodding the Browser Engine: Automatic Configuration of JavaScript CompilersChris Fawcett, Lars Kotthoff, Holger H. Hoos
Modern software systems in many application areas offer to the user a multitude of parameters, switches and other customisation hooks. Humans tend to have difficulties determining the best configurations for particular applications. Modern optimising compilers are an example of such software systems; their many parameters need to be tuned for optimal performance, but are often left at the default values for convenience. In this work, we automatically determine compiler parameter settings that result in optimised performance for particular applications. Specifically, we apply a state-of-the-art automated parameter configuration procedure based on cutting-edge machine learning and optimisation techniques to two prominent JavaScript compilers and demonstrate that significant performance improvements, more than 35% in some cases, can be achieved over the default parameter settings on a diverse set of benchmarks.
LGSep 18, 2016
mlr TutorialJulia Schiffner, Bernd Bischl, Michel Lang et al.
This document provides and in-depth introduction to the mlr framework for machine learning experiments in R.
AINov 12, 2015
ICON Challenge on Algorithm SelectionLars Kotthoff
We present the results of the ICON Challenge on Algorithm Selection.
AIOct 12, 2015
The Inductive Constraint Programming LoopChristian Bessiere, Luc De Raedt, Tias Guns et al.
Constraint programming is used for a variety of real-world optimisation problems, such as planning, scheduling and resource allocation problems. At the same time, one continuously gathers vast amounts of data about these problems. Current constraint programming software does not exploit such data to update schedules, resources and plans. We propose a new framework, that we call the Inductive Constraint Programming loop. In this approach data is gathered and analyzed systematically, in order to dynamically revise and adapt constraints and optimization criteria. Inductive Constraint Programming aims at bridging the gap between the areas of data mining and machine learning on the one hand, and constraint programming on the other hand.
AIJun 8, 2015
ASlib: A Benchmark Library for Algorithm SelectionBernd Bischl, Pascal Kerschke, Lars Kotthoff et al.
The task of algorithm selection involves choosing an algorithm from a set of algorithms on a per-instance basis in order to exploit the varying performance of algorithms over a set of instances. The algorithm selection problem is attracting increasing attention from researchers and practitioners in AI. Years of fruitful applications in a number of domains have resulted in a large amount of data, but the community lacks a standard format or repository for this data. This situation makes it difficult to share and compare different approaches effectively, as is done in other, more established fields. It also unnecessarily hinders new researchers who want to work in this area. To address this problem, we introduce a standardized format for representing algorithm selection scenarios and a repository that contains a growing number of data sets from the literature. Our format has been designed to be able to express a wide variety of different scenarios. Demonstrating the breadth and power of our platform, we describe a set of example experiments that build and evaluate algorithm selection models through a common interface. The results display the potential of algorithm selection to achieve significant performance improvements across a broad range of problems and algorithms.
AINov 18, 2013
Ranking Algorithms by PerformanceLars Kotthoff
A common way of doing algorithm selection is to train a machine learning model and predict the best algorithm from a portfolio to solve a particular problem. While this method has been highly successful, choosing only a single algorithm has inherent limitations -- if the choice was bad, no remedial action can be taken and parallelism cannot be exploited, to name but a few problems. In this paper, we investigate how to predict the ranking of the portfolio algorithms on a particular problem. This information can be used to choose the single best algorithm, but also to allocate resources to the algorithms according to their rank. We evaluate a range of approaches to predict the ranking of a set of algorithms on a problem. We furthermore introduce a framework for categorizing ranking predictions that allows to judge the expressiveness of the predictive output. Our experimental evaluation demonstrates on a range of data sets from the literature that it is beneficial to consider the relationship between algorithms when predicting rankings. We furthermore show that relatively naive approaches deliver rankings of good quality already.
AIJun 24, 2013
Proteus: A Hierarchical Portfolio of Solvers and TransformationsBarry Hurley, Lars Kotthoff, Yuri Malitsky et al.
In recent years, portfolio approaches to solving SAT problems and CSPs have become increasingly common. There are also a number of different encodings for representing CSPs as SAT instances. In this paper, we leverage advances in both SAT and CSP solving to present a novel hierarchical portfolio-based approach to CSP solving, which we call Proteus, that does not rely purely on CSP solvers. Instead, it may decide that it is best to encode a CSP problem instance into SAT, selecting an appropriate encoding and a corresponding SAT solver. Our experimental evaluation used an instance of Proteus that involved four CSP solvers, three SAT encodings, and six SAT solvers, evaluated on the most challenging problem instances from the CSP solver competitions, involving global and intensional constraints. We show that significant performance improvements can be achieved by Proteus obtained by exploiting alternative view-points and solvers for combinatorial problem-solving.
AIJun 5, 2013
LLAMA: Leveraging Learning to Automatically Manage AlgorithmsLars Kotthoff
Algorithm portfolio and selection approaches have achieved remarkable improvements over single solvers. However, the implementation of such systems is often highly customised and specific to the problem domain. This makes it difficult for researchers to explore different techniques for their specific problems. We present LLAMA, a modular and extensible toolkit implemented as an R package that facilitates the exploration of a range of different portfolio techniques on any problem domain. It implements the algorithm selection approaches most commonly used in the literature and leverages the extensive library of machine learning algorithms and techniques in R. We describe the current capabilities and limitations of the toolkit and illustrate its usage on a set of example SAT problems.
AIOct 30, 2012
Algorithm Selection for Combinatorial Search Problems: A SurveyLars Kotthoff
The Algorithm Selection Problem is concerned with selecting the best algorithm to solve a given problem on a case-by-case basis. It has become especially relevant in the last decade, as researchers are increasingly investigating how to identify the most suitable existing algorithm for solving a problem instead of developing new algorithms. This survey presents an overview of this work focusing on the contributions made in the area of combinatorial search problems, where Algorithm Selection techniques have achieved significant performance improvements. We unify and organise the vast literature according to criteria that determine Algorithm Selection systems in practice. The comprehensive classification of approaches identifies and analyses the different directions from which Algorithm Selection has been approached. This paper contrasts and compares different methods for solving the problem as well as ways of using these solutions. It closes by identifying directions of current and future research.
CESep 18, 2012
Qualitative Modelling via Constraint Programming: Past, Present and FutureThomas W. Kelsey, Lars Kotthoff, Christoffer A. Jefferson et al.
Qualitative modelling is a technique integrating the fields of theoretical computer science, artificial intelligence and the physical and biological sciences. The aim is to be able to model the behaviour of systems without estimating parameter values and fixing the exact quantitative dynamics. Traditional applications are the study of the dynamics of physical and biological systems at a higher level of abstraction than that obtained by estimation of numerical parameter values for a fixed quantitative model. Qualitative modelling has been studied and implemented to varying degrees of sophistication in Petri nets, process calculi and constraint programming. In this paper we reflect on the strengths and weaknesses of existing frameworks, we demonstrate how recent advances in constraint programming can be leveraged to produce high quality qualitative models, and we describe the advances in theory and technology that would be needed to make constraint programming the best option for scientific investigation in the broadest sense.
DCSep 16, 2012
A framework for large-scale distributed AI search across disconnected heterogeneous infrastructuresLars Kotthoff, Tom Kelsey, Martin McCaffery
We present a framework for a large-scale distributed eScience Artificial Intelligence search. Our approach is generic and can be used for many different problems. Unlike many other approaches, we do not require dedicated machines, homogeneous infrastructure or the ability to communicate between nodes. We give special consideration to the robustness of the framework, minimising the loss of effort even after total loss of infrastructure, and allowing easy verification of every step of the distribution process. In contrast to most eScience applications, the input data and specification of the problem is very small, being easily given in a paragraph of text. The unique challenges our framework tackles are related to the combinatorial explosion of the space that contains the possible solutions and the robustness of long-running computations. Not only is the time required to finish the computations unknown, but also the resource requirements may change during the course of the computation. We demonstrate the applicability of our framework by using it to solve a challenging and hitherto open problem in computational mathematics. The results demonstrate that our approach easily scales to computations of a size that would have been impossible to tackle in practice just a decade ago.