Anna V. Kononova

h-index17

29papers

525citations

Novelty39%

AI Score54

Ranked #10,001 of 194,257 authors (top 5%)#18 in NE (top 2%)

29 Papers

10.4LGAug 7, 2024

Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions

Lucas Correia, Jan-Christoph Goos, Philipp Klein et al.

Time-series anomaly detection plays an important role in engineering processes, like development, manufacturing and other operations involving dynamic systems. These processes can greatly benefit from advances in the field, as state-of-the-art approaches may aid in cases involving, for example, highly dimensional data. To provide the reader with understanding of the terminology, this survey introduces a novel taxonomy where a distinction between online and offline, and training and inference is made. Additionally, it presents the most popular data sets and evaluation metrics used in the literature, as well as a detailed analysis. Furthermore, this survey provides an extensive overview of the state-of-the-art model-based online semi- and unsupervised anomaly detection approaches for multivariate time-series data, categorising them into different model families and other properties. The biggest research challenge revolves around benchmarking, as currently there is no reliable way to compare different approaches against one another. This problem is two-fold: on the one hand, public data sets suffers from at least one fundamental flaw, while on the other hand, there is a lack of intuitive and representative evaluation metrics in the field. Moreover, the way most publications choose a detection threshold disregards real-world conditions, which hinders the application in the real world. To allow for tangible advances in the field, these issues must be addressed in future work.

4.8NEMar 7, 2022Code

The importance of being constrained: dealing with infeasible solutions in Differential Evolution and beyond

Anna V. Kononova, Diederick Vermetten, Fabio Caraffini et al.

We argue that results produced by a heuristic optimisation algorithm cannot be considered reproducible unless the algorithm fully specifies what should be done with solutions generated outside the domain, even in the case of simple box constraints. Currently, in the field of heuristic optimisation, such specification is rarely mentioned or investigated due to the assumed triviality or insignificance of this question. Here, we demonstrate that, at least in algorithms based on Differential Evolution, this choice induces notably different behaviours - in terms of performance, disruptiveness and population diversity. This is shown theoretically (where possible) for standard Differential Evolution in the absence of selection pressure and experimentally for the standard and state-of-the-art Differential Evolution variants on special test function $f_0$ and BBOB benchmarking suite, respectively. Moreover, we demonstrate that the importance of this choice quickly grows with problem's dimensionality. Different Evolution is not at all special in this regard - there is no reason to presume that other heuristic optimisers are not equally affected by the aforementioned algorithmic choice. Thus, we urge the field of heuristic optimisation to formalise and adopt the idea of a new algorithmic component in heuristic optimisers, which we call here a strategy of dealing with infeasible solutions. This component needs to be consistently (a) specified in algorithmic descriptions to guarantee reproducibility of results, (b) studied to better understand its impact on algorithm's performance in a wider sense and (c) included in the (automatic) algorithmic design. All of these should be done even for problems with box constraints.

8.3NEApr 4, 2023Code

Deep-BIAS: Detecting Structural Bias using Explainable AI

Bas van Stein, Diederick Vermetten, Fabio Caraffini et al.

Evaluating the performance of heuristic optimisation algorithms is essential to determine how well they perform under various conditions. Recently, the BIAS toolbox was introduced as a behaviour benchmark to detect structural bias (SB) in search algorithms. The toolbox can be used to identify biases in existing algorithms, as well as to test for bias in newly developed algorithms. In this article, we introduce a novel and explainable deep-learning expansion of the BIAS toolbox, called Deep-BIAS. Where the original toolbox uses 39 statistical tests and a Random Forest model to predict the existence and type of SB, the Deep-BIAS method uses a trained deep-learning model to immediately detect the strength and type of SB based on the raw performance distributions. Through a series of experiments with a variety of structurally biased scenarios, we demonstrate the effectiveness of Deep-BIAS. We also present the results of using the toolbox on 336 state-of-the-art optimisation algorithms, which showed the presence of various types of structural bias, particularly towards the centre of the objective space or exhibiting discretisation behaviour. The Deep-BIAS method outperforms the BIAS toolbox both in detecting bias and for classifying the type of SB. Furthermore, explanations can be derived using XAI techniques.

9.8LGSep 5, 2023Code

MA-VAE: Multi-head Attention-based Variational Autoencoder Approach for Anomaly Detection in Multivariate Time-series Applied to Automotive Endurance Powertrain Testing

Lucas Correia, Jan-Christoph Goos, Philipp Klein et al.

A clear need for automatic anomaly detection applied to automotive testing has emerged as more and more attention is paid to the data recorded and manual evaluation by humans reaches its capacity. Such real-world data is massive, diverse, multivariate and temporal in nature, therefore requiring modelling of the testee behaviour. We propose a variational autoencoder with multi-head attention (MA-VAE), which, when trained on unlabelled data, not only provides very few false positives but also manages to detect the majority of the anomalies presented. In addition to that, the approach offers a novel way to avoid the bypass phenomenon, an undesirable behaviour investigated in literature. Lastly, the approach also introduces a new method to remap individual windows to a continuous time series. The results are presented in the context of a real-world industrial data set and several experiments are undertaken to further investigate certain aspects of the proposed model. When configured properly, it is 9% of the time wrong when an anomaly is flagged and discovers 67% of the anomalies present. Also, MA-VAE has the potential to perform well with only a fraction of the training and validation subset, however, to extract it, a more sophisticated threshold estimation method is required.

2.4AIJan 15

How does downsampling affect needle electromyography signals? A generalisable workflow for understanding downsampling effects on high-frequency time series

Mathieu Cherpitel, Janne Luijten, Thomas Bäck et al.

Automated analysis of needle electromyography (nEMG) signals is emerging as a tool to support the detection of neuromuscular diseases (NMDs), yet the signals' high and heterogeneous sampling rates pose substantial computational challenges for feature-based machine-learning models, particularly for near real-time analysis. Downsampling offers a potential solution, but its impact on diagnostic signal content and classification performance remains insufficiently understood. This study presents a workflow for systematically evaluating information loss caused by downsampling in high-frequency time series. The workflow combines shape-based distortion metrics with classification outcomes from available feature-based machine learning models and feature space analysis to quantify how different downsampling algorithms and factors affect both waveform integrity and predictive performance. We use a three-class NMD classification task to experimentally evaluate the workflow. We demonstrate how the workflow identifies downsampling configurations that preserve diagnostic information while substantially reducing computational load. Analysis of shape-based distortion metrics showed that shape-aware downsampling algorithms outperform standard decimation, as they better preserve peak structure and overall signal morphology. The results provide practical guidance for selecting downsampling configurations that enable near real-time nEMG analysis and highlight a generalisable workflow that can be used to balance data reduction with model performance in other high-frequency time-series applications as well.

6.4LGJul 9, 2024Code

TeVAE: A Variational Autoencoder Approach for Discrete Online Anomaly Detection in Variable-state Multivariate Time-series Data

Lucas Correia, Jan-Christoph Goos, Philipp Klein et al.

As attention to recorded data grows in the realm of automotive testing and manual evaluation reaches its limits, there is a growing need for automatic online anomaly detection. This real-world data is complex in many ways and requires the modelling of testee behaviour. To address this, we propose a temporal variational autoencoder (TeVAE) that can detect anomalies with minimal false positives when trained on unlabelled data. Our approach also avoids the bypass phenomenon and introduces a new method to remap individual windows to a continuous time series. Furthermore, we propose metrics to evaluate the detection delay and root-cause capability of our approach and present results from experiments on a real-world industrial data set. When properly configured, TeVAE flags anomalies only 6% of the time wrongly and detects 65% of anomalies present. It also has the potential to perform well with a smaller training and validation subset but requires a more sophisticated threshold estimation method.

4.5LGMay 20

Objective-Induced Bias and Search Dynamics in Multiobjective Unsupervised Feature Selection

Mathieu Cherpitel, Thomas Bäck, Martijn R. Tannemaat et al.

Unsupervised feature selection is commonly formulated as a multiobjective optimisation problem that jointly optimises subset quality and subset size. Yet the behaviour of this formulation depends critically on the choice of evaluation objective, the direction of subset-size regularisation, and the initialisation strategy. We study these factors in a controlled setting using a synthetic dataset with known informative, redundant, and irrelevant feature types. Six formulations are compared by combining three evaluation objectives: accuracy, silhouette score, and PCA reconstruction loss with subset-size minimisation or maximisation. The results show that formulation strongly affects both search dynamics and the quality of the resulting Pareto front. Silhouette-based formulations exhibit a strong bias toward trivial low-cardinality solutions and remain weak proxies for predictive performance. In contrast, the proposed PCA loss objective produces compact subsets with test accuracy comparable to subsets obtained by directly optimising supervised accuracy. Overall, the study shows that objective design is central to effective multiobjective unsupervised feature selection.

1.2MAMar 3

Multi-Agent Influence Diagrams to Hybrid Threat Modeling

Maarten C. Vonk, Anna V. Kononova, Thomas Bäck et al.

Western governments have adopted an assortment of counter-hybrid threat measures to defend against hostile actions below the conventional military threshold. The impact of these measures is unclear because of the ambiguity of hybrid threats, their cross-domain nature, and uncertainty about how countermeasures shape adversarial behavior. This paper offers a novel approach to clarifying this impact by unifying previously bifurcating hybrid threat modeling methods through a (multi-agent) influence diagram framework. The model balances the costs of countermeasures, their ability to dissuade the adversary from executing hybrid threats, and their potential to mitigate the impact of hybrid threats. We run 1000 semi-synthetic variants of a real-world-inspired scenario simulating the strategic interaction between attacking agent A and defending agent B over a cyber attack on critical infrastructure to explore the effectiveness of a set of five different counter-hybrid threat measures. Counter-hybrid measures range from strengthening resilience and denial of the adversary's ability to execute a hybrid threat to dissuasion through the threat of punishment. Our analysis primarily evaluates the overarching characteristics of counter-hybrid threat measures. This approach allows us to generalize the effectiveness of these measures and examine parameter impact sensitivity. In addition, we discuss policy relevance and outline future research avenues.

6.0ROApr 22

Benefits of Low-Cost Bio-Inspiration in the Age of Overparametrization

Kevin Godin-Dubois, Anil Yaman, Anna V. Kononova

While Central Pattern Generators (CPGs) and Multi-Layer Perceptrons (MLP) are widely used paradigms in robot control, few systematic studies have been performed on the relative merits of large parameter spaces. In contexts where input and output spaces are small and performance is bounded, having more parameters to optimize may actively hinder the learning process instead of empowering it. To empirically measure this, we submit a given robot morphology, with limited proprioceptive capabilities, to controller optimization under two bio-inspired paradigms (CPGs and MLPs) with evolutionary- and reinforcement- trainer protocols. By varying parameter spaces across multiple reward functions, we observe that shallow MLPs and densely connected CPGs result in better performance when compared to deeper MLPs or Actor-Critic architectures. To account for the relationship between said performance and the number of parameters, we introduce a Parameter Impact metric which demonstrates that the additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies.

4.4AIJan 29

LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI

Niki van Stein, Anna V. Kononova, Lars Kotthoff et al.

Large language models have enabled automated algorithm design (AAD) by generating optimization algorithms directly from natural-language prompts. While evolutionary frameworks such as LLaMEA demonstrate strong exploratory capabilities across the algorithm design space, their search dynamics are entirely driven by fitness feedback, leaving substantial information about the generated code unused. We propose a mechanism for guiding AAD using feedback constructed from graph-theoretic and complexity features extracted from the abstract syntax trees of the generated algorithms, based on a surrogate model learned over an archive of evaluated solutions. Using explainable AI techniques, we identify features that substantially affect performance and translate them into natural-language mutation instructions that steer subsequent LLM-based code generation without restricting expressivity. We propose LLaMEA-SAGE, which integrates this feature-driven guidance into LLaMEA, and evaluate it across several benchmarks. We show that the proposed structured guidance achieves the same performance faster than vanilla LLaMEA in a small controlled experiment. In a larger-scale experiment using the MA-BBOB suite from the GECCO-MA-BBOB competition, our guided approach achieves superior performance compared to state-of-the-art AAD methods. These results demonstrate that signals derived from code can effectively bias LLM-driven algorithm evolution, bridging the gap between code structure and human-understandable performance feedback in automated algorithm design.

9.6AIMay 15, 2025Code

Reasoning Capabilities of Large Language Models on Dynamic Tasks

Annie Wong, Thomas Bäck, Aske Plaat et al.

Large language models excel on static benchmarks, but their ability as self-learning agents in dynamic environments remains unclear. We evaluate three prompting strategies: self-reflection, heuristic mutation, and planning across dynamic tasks with open-source models. We find that larger models generally outperform smaller ones, but that strategic prompting can close this performance gap. Second, an overly long prompt can negatively impact smaller models on basic reactive tasks, while larger models show more robust behaviour. Third, advanced prompting techniques primarily benefit smaller models on complex games, but offer less improvement for already high-performing large language models. Yet, we find that advanced reasoning methods yield highly variable outcomes: while capable of significantly improving performance when reasoning and decision-making align, they also introduce instability and can lead to big performance drops. Compared to human performance, our findings reveal little evidence of true emergent reasoning. Instead, large language model performance exhibits persistent limitations in areas like planning and spatial coordination, suggesting that large language models still suffer fundamental shortcomings that may not be fully overcome through self-reflective prompting alone. Reasoning is a multi-faceted task, and while methods like Chain-of-thought improve multi-step reasoning on math word problems, our findings using dynamic benchmarks highlight important shortcomings in general reasoning capabilities, indicating a need to move beyond static benchmarks to capture the complexity of reasoning.

16.5NEJan 31, 2024Code

Explainable Benchmarking for Iterative Optimization Heuristics

Niki van Stein, Diederick Vermetten, Anna V. Kononova et al.

Benchmarking heuristic algorithms is vital to understand under which conditions and on what kind of problems certain algorithms perform well. In most current research into heuristic optimization algorithms, only a very limited number of scenarios, algorithm configurations and hyper-parameter settings are explored, leading to incomplete and often biased insights and results. This paper presents a novel approach we call explainable benchmarking. Introducing the IOH-Xplainer software framework, for analyzing and understanding the performance of various optimization algorithms and the impact of their different components and hyper-parameters. We showcase the framework in the context of two modular optimization frameworks. Through this framework, we examine the impact of different algorithmic components and configurations, offering insights into their performance across diverse scenarios. We provide a systematic method for evaluating and interpreting the behaviour and efficiency of iterative optimization heuristics in a more transparent and comprehensible manner, allowing for better benchmarking and algorithm design.

20.0NEMar 20, 2025

Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms

Niki van Stein, Anna V. Kononova, Lars Kotthoff et al.

Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to generate competitive algorithms or the code optimization stalls, and we are left with no recourse because of a lack of understanding of the generation process and generated codes. We present a novel approach to mitigate this problem by enabling users to analyze the generated codes inside the evolutionary process and how they evolve over repeated prompting of the LLM. We show results for three benchmark problem classes and demonstrate novel insights. In particular, LLMs tend to generate more complex code with repeated prompting, but additional complexity can hurt algorithmic performance in some cases. Different LLMs have different coding ``styles'' and generated code tends to be dissimilar to other LLMs. These two findings suggest that using different LLMs inside the code evolution frameworks might produce higher performing code than using only one LLM.

15.9SEApr 28, 2025

BLADE: Benchmark suite for LLM-driven Automated Design and Evolution of iterative optimisation heuristics

Niki van Stein, Anna V. Kononova, Haoran Yin et al.

The application of Large Language Models (LLMs) for Automated Algorithm Discovery (AAD), particularly for optimisation heuristics, is an emerging field of research. This emergence necessitates robust, standardised benchmarking practices to rigorously evaluate the capabilities and limitations of LLM-driven AAD methods and the resulting generated algorithms, especially given the opacity of their design process and known issues with existing benchmarks. To address this need, we introduce BLADE (Benchmark suite for LLM-driven Automated Design and Evolution), a modular and extensible framework specifically designed for benchmarking LLM-driven AAD methods in a continuous black-box optimisation context. BLADE integrates collections of benchmark problems (including MA-BBOB and SBOX-COST among others) with instance generators and textual descriptions aimed at capability-focused testing, such as generalisation, specialisation and information exploitation. It offers flexible experimental setup options, standardised logging for reproducibility and fair comparison, incorporates methods for analysing the AAD process (e.g., Code Evolution Graphs and various visualisation approaches) and facilitates comparison against human-designed baselines through integration with established tools like IOHanalyser and IOHexplainer. BLADE provides an `out-of-the-box' solution to systematically evaluate LLM-driven AAD approaches. The framework is demonstrated through two distinct use cases exploring mutation prompt strategies and function specialisation.

4.1RONov 20, 2024

AMaze: An intuitive benchmark generator for fast prototyping of generalizable agents

Kevin Godin-Dubois, Karine Miras, Anna V. Kononova

Traditional approaches to training agents have generally involved a single, deterministic environment of minimal complexity to solve various tasks such as robot locomotion or computer vision. However, agents trained in static environments lack generalization capabilities, limiting their potential in broader scenarios. Thus, recent benchmarks frequently rely on multiple environments, for instance, by providing stochastic noise, simple permutations, or altogether different settings. In practice, such collections result mainly from costly human-designed processes or the liberal use of random number generators. In this work, we introduce AMaze, a novel benchmark generator in which embodied agents must navigate a maze by interpreting visual signs of arbitrary complexities and deceptiveness. This generator promotes human interaction through the easy generation of feature-specific mazes and an intuitive understanding of the resulting agents' strategies. As a proof-of-concept, we demonstrate the capabilities of the generator in a simple, fully discrete case with limited deceptiveness. Agents were trained under three different regimes (one-shot, scaffolding, interactive), and the results showed that the latter two cases outperform direct training in terms of generalization capabilities. Indeed, depending on the combination of generalization metric, training regime, and algorithm, the median gain ranged from 50% to 100% and maximal performance was achieved through interactive training, thereby demonstrating the benefits of a controllable human-in-the-loop benchmark generator.

13.2NEJul 4, 2025

Behaviour Space Analysis of LLM-driven Meta-heuristic Discovery

Niki van Stein, Haoran Yin, Anna V. Kononova et al.

We investigate the behaviour space of meta-heuristic optimisation algorithms automatically generated by Large Language Model driven algorithm discovery methods. Using the Large Language Evolutionary Algorithm (LLaMEA) framework with a GPT o4-mini LLM, we iteratively evolve black-box optimisation heuristics, evaluated on 10 functions from the BBOB benchmark suite. Six LLaMEA variants, featuring different mutation prompt strategies, are compared and analysed. We log dynamic behavioural metrics including exploration, exploitation, convergence and stagnation measures, for each run, and analyse these via visual projections and network-based representations. Our analysis combines behaviour-based projections, Code Evolution Graphs built from static code features, performance convergence curves, and behaviour-based Search Trajectory Networks. The results reveal clear differences in search dynamics and algorithm structures across LLaMEA configurations. Notably, the variant that employs both a code simplification prompt and a random perturbation prompt in a 1+1 elitist evolution strategy, achieved the best performance, with the highest Area Over the Convergence Curve. Behaviour-space visualisations show that higher-performing algorithms exhibit more intensive exploitation behaviour and faster convergence with less stagnation. Our findings demonstrate how behaviour-space analysis can explain why certain LLM-designed heuristics outperform others and how LLM-driven algorithm discovery navigates the open-ended and complex search space of algorithms. These findings provide insights to guide the future design of adaptive LLM-driven algorithm generators.

4.6LGNov 21, 2024Code

PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series

Lucas Correia, Jan-Christoph Goos, Thomas Bäck et al.

Benchmarking anomaly detection approaches for multivariate time series is a challenging task due to a lack of high-quality datasets. Current publicly available datasets are too small, not diverse and feature trivial anomalies, which hinders measurable progress in this research area. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools that reflects realistic behaviour of an automotive powertrain, including its multivariate, dynamic and variable-state properties. Additionally, our dataset represents a discrete-sequence problem, which remains unaddressed by previously-proposed solutions in literature. To cater for both unsupervised and semi-supervised anomaly detection settings, as well as time series generation and forecasting, we make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions, depending on the task. We also provide baseline results from a selection of approaches based on deterministic and variational autoencoders, as well as a non-parametric approach. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts, highlighting a need for approaches more robust to contaminated training data. Furthermore, results show that the threshold used can have a large influence on detection performance, hence more work needs to be invested in methods to find a suitable threshold without the need for labelled data.

6.4LGFeb 10, 2024Code

Solving Deep Reinforcement Learning Tasks with Evolution Strategies and Linear Policy Networks

Annie Wong, Jacob de Nobel, Thomas Bäck et al.

Although deep reinforcement learning methods can learn effective policies for challenging problems such as Atari games and robotics tasks, algorithms are complex, and training times are often long. This study investigates how Evolution Strategies perform compared to gradient-based deep reinforcement learning methods. We use Evolution Strategies to optimize the weights of a neural network via neuroevolution, performing direct policy search. We benchmark both deep policy networks and networks consisting of a single linear layer from observations to actions for three gradient-based methods, such as Proximal Policy Optimization. These methods are evaluated against three classical Evolution Strategies and Augmented Random Search, which all use linear policy networks. Our results reveal that Evolution Strategies can find effective linear policies for many reinforcement learning benchmark tasks, unlike deep reinforcement learning methods that can only find successful policies using much larger networks, suggesting that current benchmarks are easier to solve than previously assumed. Interestingly, Evolution Strategies also achieve results comparable to gradient-based deep reinforcement learning algorithms for higher-complexity tasks. Furthermore, we find that by directly accessing the memory state of the game, Evolution Strategies can find successful policies in Atari that outperform the policies found by Deep Q-Learning. Evolution Strategies also outperform Augmented Random Search in most benchmarks, demonstrating superior sample efficiency and robustness in training linear policy networks.

5.8NEFeb 9, 2024

A Functional Analysis Approach to Symbolic Regression

Kirill Antonov, Roman Kalkreuth, Kaifeng Yang et al.

Symbolic regression (SR) poses a significant challenge for randomized search heuristics due to its reliance on the synthesis of expressions for input-output mappings. Although traditional genetic programming (GP) algorithms have achieved success in various domains, they exhibit limited performance when tree-based representations are used for SR. To address these limitations, we introduce a novel SR approach called Fourier Tree Growing (FTG) that draws insights from functional analysis. This new perspective enables us to perform optimization directly in a different space, thus avoiding intricate symbolic expressions. Our proposed algorithm exhibits significant performance improvements over traditional GP methods on a range of classical one-dimensional benchmarking problems. To identify and explain limiting factors of GP and FTG, we perform experiments on a large-scale polynomials benchmark with high-order polynomials up to degree 100. To the best of the authors' knowledge, this work represents the pioneering application of functional analysis in addressing SR problems. The superior performance of the proposed algorithm and insights into the limitations of GP open the way for further advancing GP for SR and related areas of explainable machine learning.

3.3AINov 20, 2025

From Performance to Understanding: A Vision for Explainable Automated Algorithm Design

Niki van Stein, Anna V. Kononova, Thomas Bäck

Automated algorithm design is entering a new phase: Large Language Models can now generate full optimisation (meta)heuristics, explore vast design spaces and adapt through iterative feedback. Yet this rapid progress is largely performance-driven and opaque. Current LLM-based approaches rarely reveal why a generated algorithm works, which components matter or how design choices relate to underlying problem structures. This paper argues that the next breakthrough will come not from more automation, but from coupling automation with understanding from systematic benchmarking. We outline a vision for explainable automated algorithm design, built on three pillars: (i) LLM-driven discovery of algorithmic variants, (ii) explainable benchmarking that attributes performance to components and hyperparameters and (iii) problem-class descriptors that connect algorithm behaviour to landscape structure. Together, these elements form a closed knowledge loop in which discovery, explanation and generalisation reinforce each other. We argue that this integration will shift the field from blind search to interpretable, class-specific algorithm design, accelerating progress while producing reusable scientific insight into when and why optimisation strategies succeed.

4.1LGSep 6, 2025

DQS: A Low-Budget Query Strategy for Enhancing Unsupervised Data-driven Anomaly Detection Approaches

Lucas Correia, Jan-Christoph Goos, Thomas Bäck et al.

Truly unsupervised approaches for time series anomaly detection are rare in the literature. Those that exist suffer from a poorly set threshold, which hampers detection performance, while others, despite claiming to be unsupervised, need to be calibrated using a labelled data subset, which is often not available in the real world. This work integrates active learning with an existing unsupervised anomaly detection method by selectively querying the labels of multivariate time series, which are then used to refine the threshold selection process. To achieve this, we introduce a novel query strategy called the dissimilarity-based query strategy (DQS). DQS aims to maximise the diversity of queried samples by evaluating the similarity between anomaly scores using dynamic time warping. We assess the detection performance of DQS in comparison to other query strategies and explore the impact of mislabelling, a topic that is underexplored in the literature. Our findings indicate that DQS performs best in small-budget scenarios, though the others appear to be more robust when faced with mislabelling. Therefore, in the real world, the choice of query strategy depends on the expertise of the oracle and the number of samples they are willing to label. Regardless, all query strategies outperform the unsupervised threshold even in the presence of mislabelling. Thus, whenever it is feasible to query an oracle, employing an active learning-based threshold is recommended.

3.3NEApr 26, 2024

A Deep Dive into Effects of Structural Bias on CMA-ES Performance along Affine Trajectories

Niki van Stein, Sarah L. Thomson, Anna V. Kononova

To guide the design of better iterative optimisation heuristics, it is imperative to understand how inherent structural biases within algorithm components affect the performance on a wide variety of search landscapes. This study explores the impact of structural bias in the modular Covariance Matrix Adaptation Evolution Strategy (modCMA), focusing on the roles of various modulars within the algorithm. Through an extensive investigation involving 435,456 configurations of modCMA, we identified key modules that significantly influence structural bias of various classes. Our analysis utilized the Deep-BIAS toolbox for structural bias detection and classification, complemented by SHAP analysis for quantifying module contributions. The performance of these configurations was tested on a sequence of affine-recombined functions, maintaining fixed optimum locations while gradually varying the landscape features. Our results demonstrate an interplay between module-induced structural bias and algorithm performance across different landscape characteristics.

1.4CVJan 17, 2022Code

Using Machine Learning to Detect Rotational Symmetries from Reflectional Symmetries in 2D Images

Koen Ponse, Anna V. Kononova, Maria Loleyt et al.

Automated symmetry detection is still a difficult task in 2021. However, it has applications in computer vision, and it also plays an important part in understanding art. This paper focuses on aiding the latter by comparing different state-of-the-art automated symmetry detection algorithms. For one of such algorithms aimed at reflectional symmetries, we propose post-processing improvements to find localised symmetries in images, improve the selection of detected symmetries and identify another symmetry type (rotational). In order to detect rotational symmetries, we contribute a machine learning model which detects rotational symmetries based on provided reflection symmetry axis pairs. We demonstrate and analyze the performance of the extended algorithm to detect localised symmetries and the machine learning model to classify rotational symmetries.

21.5LGJun 29, 2021

Deep Multiagent Reinforcement Learning: Challenges and Directions

Annie Wong, Thomas Bäck, Anna V. Kononova et al.

This paper surveys the field of deep multiagent reinforcement learning. The combination of deep neural networks with reinforcement learning has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on multiple players' joint actions and (b) the computational complexity increases. We present the most common multiagent problem representations and their main challenges, and identify five research areas that address one or more of these challenges: centralised training and decentralised execution, opponent modelling, communication, efficient coordination, and reward shaping. We find that many computational studies rely on unrealistic assumptions or are not generalisable to other settings; they struggle to overcome the curse of dimensionality or nonstationarity. Approaches from psychology and sociology capture promising relevant behaviours, such as communication and coordination, to help agents achieve better performance in multiagent settings. We suggest that, for multiagent reinforcement learning to be successful, future research should address these challenges with an interdisciplinary approach to open up new possibilities in multiagent reinforcement learning.

1.2CEMay 21, 2021

Addressing the Multiplicity of Solutions in Optical Lens Design as a Niching Evolutionary Algorithms Computational Challenge

Anna V. Kononova, Ofer M. Shir, Teus Tukker et al.

Optimal Lens Design constitutes a fundamental, long-standing real-world optimization challenge. Potentially large number of optima, rich variety of critical points, as well as solid understanding of certain optimal designs per simple problem instances, provide altogether the motivation to address it as a niching challenge. This study applies established Niching-CMA-ES heuristic to tackle this design problem (6-dimensional Cooke triplet) in a simulation-based fashion. The outcome of employing Niching-CMA-ES `out-of-the-box' proves successful, and yet it performs best when assisted by a local searcher which accurately drives the search into optima. The obtained search-points are corroborated based upon concrete knowledge of this problem-instance, accompanied by gradient and Hessian calculations for validation. We extensively report on this computational campaign, which overall resulted in (i) the location of 19 out of 21 known minima within a single run, (ii) the discovery of 540 new optima. These are new minima similar in shape to 21 theoretical solutions, but some of them have better merit function value (unknown heretofore), (iii) the identification of numerous infeasibility pockets throughout the domain (also unknown heretofore). We conclude that niching mechanism is well-suited to address this problem domain, and hypothesize on the apparent multidimensional structures formed by the attained new solutions.

4.3MEMay 10, 2021

Is there Anisotropy in Structural Bias?

Diederick Vermetten, Anna V. Kononova, Fabio Caraffini et al.

Structural Bias (SB) is an important type of algorithmic deficiency within iterative optimisation heuristics. However, methods for detecting structural bias have not yet fully matured, and recent studies have uncovered many interesting questions. One of these is the question of how structural bias can be related to anisotropy. Intuitively, an algorithm that is not isotropic would be considered structurally biased. However, there have been cases where algorithms appear to only show SB in some dimensions. As such, we investigate whether these algorithms actually exhibit anisotropy, and how this impacts the detection of SB. We find that anisotropy is very rare, and even in cases where it is present, there are clear tests for SB which do not rely on any assumptions of isotropy, so we can safely expand the suite of SB tests to encompass these kinds of deficiencies not found by the original tests. We propose several additional testing procedures for SB detection and aim to motivate further research into the creation of a robust portfolio of tests. This is crucial since no single test will be able to work effectively with all types of SB we identify.

6.0NEApr 22, 2020

Differential evolution outside the box

Anna V. Kononova, Fabio Caraffini, Thomas Bäck

This paper investigates how often the popular configurations of Differential Evolution generate solutions outside the feasible domain. Following previous publications in the field, we argue that what the algorithm does with such solutions and how often this has to happen is important for the overall performance of the algorithm and interpretation of results. Based on observations therein, we conclude that significantly more solutions than what is usually assumed by practitioners need to undergo some sort of 'correction' to conform with the definition of the problem's search domain. A wide range of popular Differential Evolution configurations is considered in this study. Conclusions are made regarding the effect the Differential Evolution components and parameter settings have on the distribution of proportions of infeasible solutions generated in a series of independent runs. Results shown in this study suggest strong dependencies between proportions of generated infeasible solutions and every aspect mentioned above. Further investigation of the distribution of proportions of generated infeasible solutions is required.

14.1NEJan 18, 2019

Infeasibility and structural bias in Differential Evolution

Fabio Caraffini, Anna V. Kononova, David Corne

This paper thoroughly investigates a range of popular DE configurations to identify components responsible for the emergence of structural bias - recently identified tendency of the algorithm to prefer some regions of the search space for reasons directly unrelated to the objective function values. Such tendency was already studied in GA and PSO where a connection was established between the strength of structural bias and population sizes and potential weaknesses of these algorithms was highlighted. For DE, this study goes further and extends the range of aspects that can contribute to presence of structural bias by including algorithmic component which is usually overlooked - constraint handling technique. A wide range of DE configurations were subjected to the protocol for testing for bias. Results suggest that triggering mechanism for the bias in DE differs to the one previously found for GA and PSO - no clear dependency on population size exists. Setting of DE parameters is based on a separate study which on its own leads to interesting directions of new research. Overall, DE turned out to be robust against structural bias - only DE/current-to-best/1/bin is clearly biased but this effect is mitigated by the use of penalty constraint handling technique.

19.1NEAug 22, 2014

Structural bias in population-based algorithms

Anna V. Kononova, David W. Corne, Philippe De Wilde et al.

Challenging optimisation problems are abundant in all areas of science. Since the 1950s, scientists have developed ever-diversifying families of black box optimisation algorithms designed to address any optimisation problem, requiring only that quality of a candidate solution is calculated via a fitness function specific to the problem. For such algorithms to be successful, at least three properties are required: an effective informed sampling strategy, that guides generation of new candidates on the basis of fitnesses and locations of previously visited candidates; mechanisms to ensure efficiency, so that same candidates are not repeatedly visited; absence of structural bias, which, if present, would predispose the algorithm towards limiting its search to some regions of solution space. The first two of these properties have been extensively investigated, however the third is little understood. In this article we provide theoretical and empirical analyses that contribute to the understanding of structural bias. We prove a theorem concerning dynamics of population variance in the case of real-valued search spaces. This reveals how structural bias can manifest as non-uniform clustering of population over time. Theory predicts that structural bias is exacerbated with increasing population size and problem difficulty. These predictions reveal two previously unrecognised aspects of structural bias. Respectively, increasing population size, though ostensibly promoting diversity, will magnify any inherent structural bias, and effects of structural bias are more apparent when faced with difficult problems. Our theoretical result also suggests that two commonly used approaches to enhancing exploration, increasing population size and increasing disruptiveness of search operators, have quite distinct implications in terms of structural bias.