Axel Legay

h-index39

23papers

325citations

Novelty40%

AI Score25

Ranked #163,724 of 194,257 authors (top 84%)#1,935 in SE (top 64%)

23 Papers

2.9CRApr 12, 2022Code

Malware Analysis with Symbolic Execution and Graph Kernel

Charles-Henry Bertrand Van Ouytsel, Axel Legay

Malware analysis techniques are divided into static and dynamic analysis. Both techniques can be bypassed by circumvention techniques such as obfuscation. In a series of works, the authors have promoted the use of symbolic executions combined with machine learning to avoid such traps. Most of those works rely on natural graph-based representations that can then be plugged into graph-based learning algorithms such as Gspan. There are two main problems with this approach. The first one is in the cost of computing the graph. Indeed, working with graphs requires one to compute and representing the entire state-space of the file under analysis. As such computation is too cumbersome, the techniques often rely on developing strategies to compute a representative subgraph of the behaviors. Unfortunately, efficient graph-building strategies remain weakly explored. The second problem is in the classification itself. Graph-based machine learning algorithms rely on comparing the biggest common structures. This sidelines small but specific parts of the malware signature. In addition, it does not allow us to work with efficient algorithms such as support vector machine. We propose a new efficient open source toolchain for machine learning-based classification. We also explore how graph-kernel techniques can be used in the process. We focus on the 1-dimensional Weisfeiler-Lehman kernel, which can capture local similarities between graphs. Our experimental results show that our approach outperforms existing ones by an impressive factor.

4.3PFJan 25, 2012

Cross-entropy optimisation of importance sampling parameters for statistical model checking

Cyrille Jégourel, Axel Legay, Sean Sedwards

Statistical model checking avoids the exponential growth of states associated with probabilistic model checking by estimating properties from multiple executions of a system and by giving results within confidence bounds. Rare properties are often very important but pose a particular challenge for simulation-based approaches, hence a key objective under these circumstances is to reduce the number and length of simulations necessary to produce a given level of confidence. Importance sampling is a well-established technique that achieves this, however to maintain the advantages of statistical model checking it is necessary to find good importance sampling distributions without considering the entire state space. Motivated by the above, we present a simple algorithm that uses the notion of cross-entropy to find the optimal parameters for an importance sampling distribution. In contrast to previous work, our algorithm uses a low dimensional vector of parameters to define this distribution and thus avoids the often intractable explicit representation of a transition matrix. We show that our parametrisation leads to a unique optimum and can produce many orders of magnitude improvement in simulation efficiency. We demonstrate the efficacy of our methodology by applying it to models from reliability engineering and biochemistry.

5.1CYJun 14, 2023

Compatibility of Fairness Metrics with EU Non-Discrimination Laws: Demographic Parity & Conditional Demographic Disparity

Lisa Koutsoviti Koumeri, Magali Legast, Yasaman Yousefi et al.

Empirical evidence suggests that algorithmic decisions driven by Machine Learning (ML) techniques threaten to discriminate against legally protected groups or create new sources of unfairness. This work supports the contextual approach to fairness in EU non-discrimination legal framework and aims at assessing up to what point we can assure legal fairness through fairness metrics and under fairness constraints. For that, we analyze the legal notion of non-discrimination and differential treatment with the fairness definition Demographic Parity (DP) through Conditional Demographic Disparity (CDD). We train and compare different classifiers with fairness constraints to assess whether it is possible to reduce bias in the prediction while enabling the contextual approach to judicial interpretation practiced under EU non-discrimination laws. Our experimental results on three scenarios show that the in-processing bias mitigation algorithm leads to different performances in each of them. Our experiments and analysis suggest that AI-assisted decision-making can be fair from a legal perspective depending on the case at hand and the legal justification. These preliminary results encourage future work which will involve further case studies, metrics, and fairness notions.

3.6SESep 24, 2021

Test Scenario Generation for Context-Oriented Programs

Pierre Martou, Kim Mens, Benoît Duhoux et al.

Their highly adaptive nature and the combinatorial explosion of possible configurations makes testing context-oriented programs hard. We propose a methodology to automate the generation of test scenarios for developers of feature-based context-oriented programs. By using combinatorial interaction testing we generate a covering array from which a small but representative set of test scenarios can be inferred. By taking advantage of the explicit separation of contexts and features in such context-oriented programs, we can further rearrange the generated test scenarios to minimise the reconfiguration cost between subsequent scenarios. Finally, we explore how a previously generated test suite can be adapted incrementally when the system evolves to a new version. By validating these algorithms on a small use case, our initial results show that the proposed test generation approach is efficient and beneficial to developers to test and improve the design of context-oriented programs.

3.6SEAug 6, 2021

Analysis of Source Code Using UPPAAL

Mitja Kulczynski, Axel Legay, Dirk Nowotka et al.

In recent years there has been a considerable effort in optimising formal methods for application to code. This has been driven by tools such as CPAChecker, DIVINE, and CBMC. At the same time tools such as Uppaal have been massively expanding the realm of more traditional model checking technologies to include strategy synthesis algorithms - an aspect becoming more and more needed as software becomes increasingly parallel. Instead of reimplementing the advances made by Uppaal in this area, we suggest in this paper to develop a bridge between the source code and the engine of Uppaal. Our approach uses the widespread intermediate language LLVM and makes recent advances of the Uppaal ecosystem readily available to analysis of source code.

8.8CRMay 2, 2021

Analysis of Machine Learning Approaches to Packing Detection

Charles-Henry Bertrand Van Ouytsel, Thomas Given-Wilson, Jeremy Minet et al.

Packing is an obfuscation technique widely used by malware to hide the content and behavior of a program. Much prior research has explored how to detect whether a program is packed. This research includes a broad variety of approaches such as entropy analysis, syntactic signatures and more recently machine learning classifiers using various features. However, no robust results have indicated which algorithms perform best, or which features are most significant. This is complicated by considering how to evaluate the results since accuracy, cost, generalization capabilities, and other measures are all reasonable. This work explores eleven different machine learning approaches using 119 features to understand: which features are most significant for packing detection; which algorithms offer the best performance; and which algorithms are most economical.

6.6CRJan 21, 2021Code

Quantitative Security Risk Modeling and Analysis with RisQFLan

Maurice H. ter Beek, Axel Legay, Alberto Lluch Lafuente et al.

Domain-specific quantitative modeling and analysis approaches are fundamental in scenarios in which qualitative approaches are inappropriate or unfeasible. In this paper, we present a tool-supported approach to quantitative graph-based security risk modeling and analysis based on attack-defense trees. Our approach is based on QFLan, a successful domain-specific approach to support quantitative modeling and analysis of highly configurable systems, whose domain-specific components have been decoupled to facilitate the instantiation of the QFLan approach in the domain of graph-based security risk modeling and analysis. Our approach incorporates distinctive features from three popular kinds of attack trees, namely enhanced attack trees, capabilities-based attack trees and attack countermeasure trees, into the domain-specific modeling language. The result is a new framework, called RisQFLan, to support quantitative security risk modeling and analysis based on attack-defense diagrams. By offering either exact or statistical verification of probabilistic attack scenarios, RisQFLan constitutes a significant novel contribution to the existing toolsets in that domain. We validate our approach by highlighting the additional features offered by RisQFLan in three illustrative case studies from seminal approaches to graph-based security risk modeling analysis based on attack trees.

1.2PLDec 10, 2020

A Decision Tree Lifted Domain for Analyzing Program Families with Numerical Features (Extended Version)

Aleksandar S. Dimovski, Sven Apel, Axel Legay

Lifted (family-based) static analysis by abstract interpretation is capable of analyzing all variants of a program family simultaneously, in a single run without generating any of the variants explicitly. The elements of the underlying lifted analysis domain are tuples, which maintain one property per variant. Still, explicit property enumeration in tuples, one by one for all variants, immediately yields combinatorial explosion. This is particularly apparent in the case of program families that, apart from Boolean features, contain also numerical features with big domains, thus admitting astronomic configuration spaces. The key for an efficient lifted analysis is proper handling of variability-specific constructs of the language (e.g., feature-based runtime tests and #if directives). In this work, we introduce a new symbolic representation of the lifted abstract domain that can efficiently analyze program families with numerical features. This makes sharing between property elements corresponding to different variants explicitly possible. The elements of the new lifted domain are constraint-based decision trees, where decision nodes are labeled with linear constraints defined over numerical features and the leaf nodes belong to an existing single-program analysis domain. To illustrate the potential of this representation, we have implemented an experimental lifted static analyzer, called SPLNUM^2Analyzer, for inferring invariants of C programs. It uses existing numerical domains (e.g., intervals, octagons, polyhedra) from the APRON library as parameters. An empirical evaluation on benchmarks from SV-COMP and BusyBox yields promising preliminary results indicating that our decision trees-based approach is effective and outperforms the tuple-based approach, which is used as a baseline lifted analysis based on abstract interpretation.

4.3PLJun 4, 2020

Automatic Verification of LLVM Code

Axel Legay, Dirk Nowotka, Danny Bøgsted Poulsen

In this work we present our work in developing a software verification tool for llvm-code - Lodin - that incorporates both explicit-state model checking, statistical model checking and symbolic state model checking algorithms.

1.2LOMay 12, 2020

Featured Games

Uli Fahrenberg, Axel Legay

Feature-based SPL analysis and family-based model checking have seen rapid development. Many model checking problems can be reduced to two-player games on finite graphs. A prominent example is mu-calculus model checking, which is generally done by translating to parity games, but also many quantitative model-checking problems can be reduced to (quantitative) games. In their FASE'20 paper, ter Beek et al.\ introduce parity games with variability in order to develop family-based mu-calculus model checking of featured transition systems. We generalize their model to general featured games and show how these may be analysed in a family-based manner. We introduce featured reachability games, featured minimum reachability games, featured discounted games, featured energy games, and featured parity games. We show how to compute winners and values of such games in a family-based manner. We also show that all these featured games admit optimal featured strategies, which project to optimal strategies for any product. Further, we develop family-based algorithms, using late splitting, to compute winners, values, and optimal strategies for all the featured games we have introduced.

9.7CRJun 18, 2019

Secure Architectures Implementing Trusted Coalitions for Blockchained Distributed Learning (TCLearn)

Sebastien Lugan, Paul Desbordes, Luis Xavier Ramos Tormo et al.

Distributed learning across a coalition of organizations allows the members of the coalition to train and share a model without sharing the data used to optimize this model. In this paper, we propose new secure architectures that guarantee preservation of data privacy, trustworthy sequence of iterative learning and equitable sharing of the learned model among each member of the coalition by using adequate encryption and blockchain mechanisms. We exemplify its deployment in the case of the distributed optimization of a deep learning convolutional neural network trained on medical images.

1.2PLFeb 14, 2019

Variability Abstraction and Refinement for Game-based Lifted Model Checking of full CTL (Extended Version)

Aleksandar S. Dimovski, Axel Legay, Andrzej Wasowski

Variability models allow effective building of many custom model variants for various configurations. Lifted model checking for a variability model is capable of verifying all its variants simultaneously in a single run by exploiting the similarities between the variants. The computational cost of lifted model checking still greatly depends on the number of variants (the size of configuration space), which is often huge. One of the most promising approaches to fighting the configuration space explosion problem in lifted model checking are variability abstractions. In this work, we define a novel game-based approach for variability-specific abstraction and refinement for lifted model checking of the full CTL, interpreted over 3-valued semantics. We propose a direct algorithm for solving a 3-valued (abstract) lifted model checking game. In case the result of model checking an abstract variability model is indefinite, we suggest a new notion of refinement, which eliminates indefinite results. This provides an iterative incremental variability-specific abstraction and refinement framework, where refinement is applied only where indefinite results exist and definite results from previous iterations are reused.

2.3ITSep 8, 2018

Hybrid Statistical Estimation of Mutual Information and its Application to Information Flow

Fabrizio Biondi, Yusuke Kawamoto, Axel Legay et al.

Analysis of a probabilistic system often requires to learn the joint probability distribution of its random variables. The computation of the exact distribution is usually an exhaustive precise analysis on all executions of the system. To avoid the high computational cost of such an exhaustive search, statistical analysis has been studied to efficiently obtain approximate estimates by analyzing only a small but representative subset of the system's behavior. In this paper we propose a hybrid statistical estimation method that combines precise and statistical analyses to estimate mutual information, Shannon entropy, and conditional entropy, together with their confidence intervals. We show how to combine the analyses on different components of a discrete system with different accuracy to obtain an estimate for the whole system. The new method performs weighted statistical analysis with different sample sizes over different components and dynamically finds their optimal sample sizes. Moreover, it can reduce sample sizes by using prior knowledge about systems and a new abstraction-then-sampling technique based on qualitative analysis. To apply the method to the source code of a system, we show how to decompose the code into components and to determine the analysis method for each component by overviewing the implementation of those techniques in the HyLeak tool. We demonstrate with case studies that the new method outperforms the state of the art in quantifying information leakage.

10.1SEJul 26, 2017

A framework for quantitative modeling and analysis of highly (re)configurable systems

Maurice H. ter Beek, Axel Legay, Alberto Lluch Lafuente et al.

This paper presents our approach to the quantitative modeling and analysis of highly (re)configurable systems, such as software product lines. Different combinations of the optional features of such a system give rise to combinatorially many individual system variants. We use a formal modeling language that allows us to model systems with probabilistic behavior, possibly subject to quantitative feature constraints, and able to dynamically install, remove or replace features. More precisely, our models are defined in the probabilistic feature-oriented language QFLAN, a rich domain specific language (DSL) for systems with variability defined in terms of features. QFLAN specifications are automatically encoded in terms of a process algebra whose operational behavior interacts with a store of constraints, and hence allows to separate system configuration from system behavior. The resulting probabilistic configurations and behavior converge seamlessly in a semantics based on discrete-time Markov chains, thus enabling quantitative analysis. Our analysis is based on statistical model checking techniques, which allow us to scale to larger models with respect to precise probabilistic analysis techniques. The analyses we can conduct range from the likelihood of specific behavior to the expected average cost, in terms of feature attributes, of specific system variants. Our approach is supported by a novel Eclipse-based tool which includes state-of-the-art DSL utilities for QFLAN based on the Xtext framework as well as analysis plug-ins to seamlessly run statistical model checking analyses. We provide a number of case studies that have driven and validated the development of our framework.

2.3FLFeb 24, 2017

Featured Weighted Automata

Uli Fahrenberg, Axel Legay

A featured transition system is a transition system in which the transitions are annotated with feature expressions: Boolean expressions on a finite number of given features. Depending on its feature expression, each individual transition can be enabled when some features are present, and disabled for other sets of features. The behavior of a featured transition system hence depends on a given set of features. There are algorithms for featured transition systems which can check their properties for all sets of features at once, for example for LTL or CTL properties. Here we introduce a model of featured weighted automata which combines featured transition systems and (semiring-) weighted automata. We show that methods and techniques from weighted automata extend to featured weighted automata and devise algorithms to compute quantitative properties of featured weighted automata for all sets of features at once. We show applications to minimum reachability and to energy properties.

9.7SEMay 9, 2016

Verification of interlocking systems using statistical model checking

Quentin Cappart, Christophe Limbree, Pierre Schaus et al.

In the railway domain, an interlocking is the system ensuring safe train traffic inside a station by controlling its active elements such as the signals or points. Modern interlockings are configured using particular data, called application data, reflecting the track layout and defining the actions that the interlocking can take. The safety of the train traffic relies thereby on application data correctness, errors inside them can cause safety issues such as derailments or collisions. Given the high level of safety required by such a system, its verification is a critical concern. In addition to the safety, an interlocking must also ensure that availability properties, stating that no train would be stopped forever in a station, are satisfied. Most of the research dealing with this verification relies on model checking. However, due to the state space explosion problem, this approach does not scale for large stations. More recently, a discrete event simulation approach limiting the verification to a set of likely scenarios, was proposed. The simulation enables the verification of larger stations, but with no proof that all the interesting scenarios are covered by the simulation. In this paper, we apply an intermediate statistical model checking approach, offering both the advantages of model checking and simulation. Even if exhaustiveness is not obtained, statistical model checking evaluates with a parametrizable confidence the reliability and the availability of the entire system.

3.8SEJul 29, 2015

Dependability Analysis of Control Systems using SystemC and Statistical Model Checking

Van Chan Ngo, Axel Legay

Stochastic Petri nets are commonly used for modeling distributed systems in order to study their performance and dependability. This paper proposes a realization of stochastic Petri nets in SystemC for modeling large embedded control systems. Then statistical model checking is used to analyze the dependability of the constructed model. Our verification framework allows users to express a wide range of useful properties to be verified which is illustrated through a case study.

8.8SEApr 14, 2015

Quantitative Analysis of Probabilistic Models of Software Product Lines with Statistical Model Checking

Maurice H. ter Beek, Axel Legay, Alberto Lluch Lafuente et al.

We investigate the suitability of statistical model checking techniques for analysing quantitative properties of software product line models with probabilistic aspects. For this purpose, we enrich the feature-oriented language FLan with action rates, which specify the likelihood of exhibiting particular behaviour or of installing features at a specific moment or in a specific order. The enriched language (called PFLan) allows us to specify models of software product lines with probabilistic configurations and behaviour, e.g. by considering a PFLan semantics based on discrete-time Markov chains. The Maude implementation of PFLan is combined with the distributed statistical model checker MultiVeStA to perform quantitative analyses of a simple product line case study. The presented analyses include the likelihood of certain behaviour of interest (e.g. product malfunctioning) and the expected average cost of products.

1.2SYJun 12, 2015

Model Checking as Control: Feedback Control for Statistical Model Checking of Cyber-Physical Systems

Kenan Kalajdzic, Cyrille Jegourel, Ezio Bartocci et al.

We introduce feedback-control statistical system checking (FC-SSC), a new approach to statistical model checking that exploits principles of feedback-control for the analysis of cyber-physical systems (CPS). FC-SSC uses stochastic system identification to learn a CPS model, importance sampling to estimate the CPS state, and importance splitting to control the CPS so that the probability that the CPS satisfies a given property can be efficiently inferred. We illustrate the utility of FC-SSC on two example applications, each of which is simple enough to be easily understood, yet complex enough to exhibit all of FC-SCC's features. To the best of our knowledge, FC-SSC is the first statistical system checker to efficiently estimate the probability of rare events in realistic CPS applications or in any complex probabilistic program whose model is either not available, or is infeasible to derive through static-analysis techniques.

6.9SEDec 2, 2014

Dynamic Verification of SystemC with Statistical Model Checking

Van Chan Ngo, Axel Legay, Jean Quilbeuf

Many embedded and real-time systems have a inherent probabilistic behaviour (sensors data, unreliable hardware,...). In that context, it is crucial to evaluate system properties such as "the probability that a particular hardware fails". Such properties can be evaluated by using probabilistic model checking. However, this technique fails on models representing realistic embedded and real-time systems because of the state space explosion. To overcome this problem, we propose a verification framework based on Statistical Model Checking. Our framework is able to evaluate probabilistic and temporal properties on large systems modelled in SystemC, a standard system-level modelling language. It is fully implemented as an extension of the Plasma-lab statistical model checker. We illustrate our approach on a multi-lift system case study.

2.0CLMar 17, 2014

Measuring Global Similarity between Texts

Uli Fahrenberg, Fabrizio Biondi, Kevin Corre et al.

We propose a new similarity measure between texts which, contrary to the current state-of-the-art approaches, takes a global view of the texts to be compared. We have implemented a tool to compute our textual distance and conducted experiments on several corpuses of texts. The experiments show that our methods can reliably identify different global types of texts.

4.1SENov 13, 2013

Proceedings 1st Workshop on Advances in Systems of Systems

Kim G. Larsen, Axel Legay, Ulrik Nyman

This volume contains the proceedings of the first workshop on Advances in Systems of Systems (AISOS'13), held in Roma, Italy, March 16. System-of-Systems describes the large scale integration of many independent self-contained systems to satisfy global needs or multi-system requests. Examples are smart grid, intelligent buildings, smart cities, transport systems, etc. There is a need for new modeling formalisms, analysis methods and tools to help make trade-off decisions during design and evolution avoiding leading to sub-optimal design and rework during integration and in service. The workshop should focus on the modeling and analysis of System of Systems. AISOS'13 aims to gather people from different communities in order to encourage exchange of methods and views.

5.1DSOct 14, 2013

Scalable Verification of Markov Decision Processes

Axel Legay, Sean Sedwards, Louis-Marie Traonouez

Markov decision processes (MDP) are useful to model concurrent process optimisation problems, but verifying them with numerical methods is often intractable. Existing approximative approaches do not scale well and are limited to memoryless schedulers. Here we present the basis of scalable verification for MDPSs, using an O(1) memory representation of history-dependent schedulers. We thus facilitate scalable learning techniques and the use of massively parallel verification.