Marc Frappier

SE
h-index22
13papers
78citations
Novelty23%
AI Score23

13 Papers

LGApr 21, 2022
A Revealing Large-Scale Evaluation of Unsupervised Anomaly Detection Algorithms

Maxime Alvarez, Jean-Charles Verdier, D'Jeff K. Nkashama et al.

Anomaly detection has many applications ranging from bank-fraud detection and cyber-threat detection to equipment maintenance and health monitoring. However, choosing a suitable algorithm for a given application remains a challenging design decision, often informed by the literature on anomaly detection algorithms. We extensively reviewed twelve of the most popular unsupervised anomaly detection methods. We observed that, so far, they have been compared using inconsistent protocols - the choice of the class of interest or the positive class, the split of training and test data, and the choice of hyperparameters - leading to ambiguous evaluations. This observation led us to define a coherent evaluation protocol which we then used to produce an updated and more precise picture of the relative performance of the twelve methods on five widely used tabular datasets. While our evaluation cannot pinpoint a method that outperforms all the others on all datasets, it identifies those that stand out and revise misconceived knowledge about their relative performances.

CRJun 25, 2022
Robustness Evaluation of Deep Unsupervised Learning Algorithms for Intrusion Detection Systems

D'Jeff Kanda Nkashama, Arian Soltani, Jean-Charles Verdier et al.

Recently, advances in deep learning have been observed in various fields, including computer vision, natural language processing, and cybersecurity. Machine learning (ML) has demonstrated its ability as a potential tool for anomaly detection-based intrusion detection systems to build secure computer networks. Increasingly, ML approaches are widely adopted than heuristic approaches for cybersecurity because they learn directly from data. Data is critical for the development of ML systems, and becomes potential targets for attackers. Basically, data poisoning or contamination is one of the most common techniques used to fool ML models through data. This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data. Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation when developing novel models, especially for intrusion detection systems.

LGJul 11, 2024
Deep Learning for Network Anomaly Detection under Data Contamination: Evaluating Robustness and Mitigating Performance Degradation

D'Jeff K. Nkashama, Jordan Masakuna Félicien, Arian Soltani et al.

Deep learning (DL) has emerged as a crucial tool in network anomaly detection (NAD) for cybersecurity. While DL models for anomaly detection excel at extracting features and learning patterns from data, they are vulnerable to data contamination -- the inadvertent inclusion of attack-related data in training sets presumed benign. This study evaluates the robustness of six unsupervised DL algorithms against data contamination using our proposed evaluation protocol. Results demonstrate significant performance degradation in state-of-the-art anomaly detection algorithms when exposed to contaminated data, highlighting the critical need for self-protection mechanisms in DL-based NAD models. To mitigate this vulnerability, we propose an enhanced auto-encoder with a constrained latent representation, allowing normal data to cluster more densely around a learnable center in the latent space. Our evaluation reveals that this approach exhibits improved resistance to data contamination compared to existing methods, offering a promising direction for more robust NAD systems.

STFeb 7, 2023
Characterizing Financial Market Coverage using Artificial Intelligence

Jean Marie Tshimula, D'Jeff K. Nkashama, Patrick Owusu et al.

This paper scrutinizes a database of over 4900 YouTube videos to characterize financial market coverage. Financial market coverage generates a large number of videos. Therefore, watching these videos to derive actionable insights could be challenging and complex. In this paper, we leverage Whisper, a speech-to-text model from OpenAI, to generate a text corpus of market coverage videos from Bloomberg and Yahoo Finance. We employ natural language processing to extract insights regarding language use from the market coverage. Moreover, we examine the prominent presence of trending topics and their evolution over time, and the impacts that some individuals and organizations have on the financial market. Our characterization highlights the dynamics of the financial market coverage and provides valuable insights reflecting broad discussions regarding recent financial events and the world economy.

LGAug 14, 2024
Impact of Inaccurate Contamination Ratio on Robust Unsupervised Anomaly Detection

Jordan F. Masakuna, DJeff Kanda Nkashama, Arian Soltani et al.

Training data sets intended for unsupervised anomaly detection, typically presumed to be anomaly-free, often contain anomalies (or contamination), a challenge that significantly undermines model performance. Most robust unsupervised anomaly detection models rely on contamination ratio information to tackle contamination. However, in reality, contamination ratio may be inaccurate. We investigate on the impact of inaccurate contamination ratio information in robust unsupervised anomaly detection. We verify whether they are resilient to misinformed contamination ratios. Our investigation on 6 benchmark data sets reveals that such models are not adversely affected by exposure to misinformation. In fact, they can exhibit improved performance when provided with such inaccurate contamination ratios.

SEMar 6, 2018Code
SysML/KAOS Domain Models and B System Specifications

Steve Jeffrey Tueno Fotso, Marc Frappier, Amel Mammar et al.

In this paper, we use a combination of the SysML/KAOS requirements engineering method, an extension of SysML, with concepts of the KAOS goal model, and of the B System formal method. Translation rules from a SysML/KAOS goal model to a B System specification have been defined. They allow to obtain a skeleton of the B System specification. To complete it, we have defined a language to express the domain model associated to the goal model. The translation of this domain model gives the structural part of the B System specification. The contribution of this paper is the description of translation rules from SysML/KAOS domain models to B System specifications. We also present the formal verification of these rules and we describe an open source tool that implements the languages and the rules. Finally, we provide a review of the application of the SysML/KAOS method on case studies such as for the formal specification of the hybrid ERTMS/ETCS level 3 standard.

CRNov 25, 2024
Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective

Jean Marie Tshimula, Xavier Ndona, D'Jeff K. Nkashama et al.

Jailbreak prompts pose a significant threat in AI and cybersecurity, as they are crafted to bypass ethical safeguards in large language models, potentially enabling misuse by cybercriminals. This paper analyzes jailbreak prompts from a cyber defense perspective, exploring techniques like prompt injection and context manipulation that allow harmful content generation, content filter evasion, and sensitive information extraction. We assess the impact of successful jailbreaks, from misinformation and automated social engineering to hazardous content creation, including bioweapons and explosives. To address these threats, we propose strategies involving advanced prompt analysis, dynamic safety protocols, and continuous model fine-tuning to strengthen AI resilience. Additionally, we highlight the need for collaboration among AI researchers, cybersecurity experts, and policymakers to set standards for protecting AI systems. Through case studies, we illustrate these cyber defense approaches, promoting responsible AI practices to maintain system integrity and public trust. \textbf{\color{red}Warning: This paper contains content which the reader may find offensive.}

SENov 10, 2024
ASTD Patterns for Integrated Continuous Anomaly Detection In Data Logs

Chaymae El Jabri, Marc Frappier, Pierre-Martin Tardif

This paper investigates the use of the ASTD language for ensemble anomaly detection in data logs. It uses a sliding window technique for continuous learning in data streams, coupled with updating learning models upon the completion of each window to maintain accurate detection and align with current data trends. It proposes ASTD patterns for combining learning models, especially in the context of unsupervised learning, which is commonly used for data streams. To facilitate this, a new ASTD operator is proposed, the Quantified Flow, which enables the seamless combination of learning models while ensuring that the specification remains concise. Our contribution is a specification pattern, highlighting the capacity of ASTDs to abstract and modularize anomaly detection systems. The ASTD language provides a unique approach to develop data flow anomaly detection systems, grounded in the combination of processes through the graphical representation of the language operators. This simplifies the design task for developers, who can focus primarily on defining the functional operations that constitute the system.

CLJun 26, 2024
Psychological Profiling in Cybersecurity: A Look at LLMs and Psycholinguistic Features

Jean Marie Tshimula, D'Jeff K. Nkashama, Jean Tshibangu Muabila et al.

The increasing sophistication of cyber threats necessitates innovative approaches to cybersecurity. In this paper, we explore the potential of psychological profiling techniques, particularly focusing on the utilization of Large Language Models (LLMs) and psycholinguistic features. We investigate the intersection of psychology and cybersecurity, discussing how LLMs can be employed to analyze textual data for identifying psychological traits of threat actors. We explore the incorporation of psycholinguistic features, such as linguistic patterns and emotional cues, into cybersecurity frameworks. Our research underscores the importance of integrating psychological perspectives into cybersecurity practices to bolster defense mechanisms against evolving threats.

SENov 8, 2018
The Generic SysML/KAOS Domain Metamodel

Steve Jeffrey Tueno Fotso, Marc Frappier, Régine Laleau et al.

This paper is related to the generalised/generic version of the SysML/KAOS domain metamodel and on translation and back propagation rules between the new domain models and B System specifications.

SEDec 20, 2017
Formal Representation of SysML/KAOS Domain Model (Complete Version)

Steve Tueno, Régine Laleau, Amel Mammar et al.

Nowadays, the usefulness of a formal language for ensuring the consistency of requirements is well established. The work presented here is part of the definition of a formally-grounded, model-based requirements engineering method for critical and complex systems. Requirements are captured through the SysML/KAOS method and the targeted formal specification is written using the Event-B method. Firstly, an Event-B skeleton is produced from the goal hierarchy provided by the SysML/KAOS goal model. This skeleton is then completed in a second step by the Event-B specification obtained from system application domain properties that gives rise to the system structure. Considering that the domain is represented using ontologies through the SysML/KAOS Domain Model method, is it possible to automatically produce the structural part of system Event-B models ? This paper proposes a set of generic rules that translate SysML/KAOS domain ontologies into an Event-B specification. The rules have been expressed, verified and validated through the Rodin tool using the Event-B method. They are illustrated through a case study dealing with a landing gear system. Our proposition makes it possible to automatically obtain, from a representation of the system application domain in the form of ontologies, the structural part of the Event-B specification which will be used to formally validate the consistency of system requirements.

SEOct 2, 2017
The SysML/KAOS Domain Modeling Approach

Steve Tueno, Régine Laleau, Amel Mammar et al.

A means of building safe critical systems consists of formally modeling the requirements formulated by stakeholders and ensuring their consistency with respect to application domain properties. This paper proposes a metamodel for an ontology modeling formalism based on OWL and PLIB. This modeling formalism is part of a method for modeling the domain of systems whose requirements are captured through SysML/KAOS. The formal semantics of SysML/KAOS goals are represented using Event-B specifications. Goals provide the set of events, while domain models will provide the structure of the system state of the Event-B specification. Our proposal is illustrated through a case study dealing with a Cycab localization component specification. The case study deals with the specification of a localization software component that uses GPS,Wi-Fi and sensor technologies for the realtime localization of the Cycab vehicle, an autonomous ground transportation system designed to be robust and completely independent.

SEJun 7, 2016
Formal refinement of extended state machines

Thomas Fayolle, Marc Frappier, Régine Laleau et al.

In a traditional formal development process, e.g. using the B method, the informal user requirements are (manually) translated into a global abstract formal specification. This translation is especially difficult to achieve. The Event-B method was developed to incrementally and formally construct such a specification using stepwise refinement. Each increment takes into account new properties and system aspects. In this paper, we propose to couple a graphical notation called Algebraic State-Transition Diagrams (ASTD) with an Event-B specification in order to provide a better understanding of the software behaviour. The dynamic behaviour is captured by the ASTD, which is based on automata and process algebra operators, while the data model is described by means of an Event-B specification. We propose a methodology to incrementally refine such specification couplings, taking into account new refinement relations and consistency conditions between the control specification and the data specification. We compare the specifications obtained using each approach for readability and proof complexity. The advantages and drawbacks of the traditional approach and of our methodology are discussed. The whole process is illustrated by a railway CBTC-like case study. Our approach is supported by tools for translating ASTD's into B and Event-B into B.