Benjamin Marais

CR
h-index4
6papers
67citations
Novelty48%
AI Score34

6 Papers

CRMar 24, 2022
MERLIN -- Malware Evasion with Reinforcement LearnINg

Tony Quertier, Benjamin Marais, Stéphane Morucci et al.

In addition to signature-based and heuristics-based detection techniques, machine learning (ML) is widely used to generalize to new, never-before-seen malicious software (malware). However, it has been demonstrated that ML models can be fooled by tricking the classifier into returning the incorrect label. These studies, for instance, usually rely on a prediction score that is fragile to gradient-based attacks. In the context of a more realistic situation where an attacker has very little information about the outputs of a malware detection engine, modest evasion rates are achieved. In this paper, we propose a method using reinforcement learning with DQN and REINFORCE algorithms to challenge two state-of-the-art ML-based detection engines (MalConv \& EMBER) and a commercial AV classified by Gartner as a leader AV. Our method combines several actions, modifying a Windows portable execution (PE) file without breaking its functionalities. Our method also identifies which actions perform better and compiles a detailed vulnerability report to help mitigate the evasion. We demonstrate that REINFORCE achieves very good evasion rates even on a commercial AV with limited available information.

CRJul 5, 2022
AI-based Malware and Ransomware Detection Models

Benjamin Marais, Tony Quertier, Stéphane Morucci

Cybercrime is one of the major digital threats of this century. In particular, ransomware attacks have significantly increased, resulting in global damage costs of tens of billion dollars. In this paper, we train and test different Machine Learning and Deep Learning models for malware detection, malware classification and ransomware detection. We introduce a novel and flexible solution that combines two optimized models for malware and ransomware detection. Our results demonstrate some improvements both in terms of detection performances and flexibility. In particular, our combined models pave the way for easier future enhancements using specialized and thus interchangeable detection modules.

CRAug 5, 2024
A Lean Transformer Model for Dynamic Malware Analysis and Detection

Tony Quertier, Benjamin Marais, Grégoire Barrué et al.

Malware is a fast-growing threat to the modern computing world and existing lines of defense are not efficient enough to address this issue. This is mainly due to the fact that many prevention solutions rely on signature-based detection methods that can easily be circumvented by hackers. Therefore, there is a recurrent need for behavior-based analysis where a suspicious file is ran in a secured environment and its traces are collected to reports for analysis. Previous works have shown some success leveraging Neural Networks and API calls sequences extracted from these execution reports. Recently, Large Language Models and Generative AI have demonstrated impressive capabilities mainly in Natural Language Processing tasks and promising applications in the cybersecurity field for both attackers and defenders. In this paper, we design an Encoder-Only model, based on the Transformers architecture, to detect malicious files, digesting their API call sequences collected by an execution emulation solution. We are also limiting the size of the model architecture and the number of its parameters since it is often considered that Large Language Models may be overkill for specific tasks such as the one we are dealing with hereafter. In addition to achieving decent detection results, this approach has the advantage of reducing our carbon footprint by limiting training and inference times and facilitating technical operations with less hardware requirements. We also carry out some analysis of our results and highlight the limits and possible improvements when using Transformers to analyze malicious files.

CRAug 21, 2023
Optimized Deep Learning Models for Malware Detection under Concept Drift

William Maillet, Benjamin Marais

Despite the promising results of machine learning models in malicious files detection, they face the problem of concept drift due to their constant evolution. This leads to declining performance over time, as the data distribution of the new files differs from the training one, requiring frequent model update. In this work, we propose a model-agnostic protocol to improve a baseline neural network against drift. We show the importance of feature reduction and training with the most recent validation set possible, and propose a loss function named Drift-Resilient Binary Cross-Entropy, an improvement to the classical Binary Cross-Entropy more effective against drift. We train our model on the EMBER dataset, published in2018, and evaluate it on a dataset of recent malicious files, collected between 2020 and 2023. Our improved model shows promising results, detecting 15.2% more malware than a baseline model.

CRJun 13, 2025
Semantic Preprocessing for LLM-based Malware Analysis

Benjamin Marais, Tony Quertier, Grégoire Barrue

In a context of malware analysis, numerous approaches rely on Artificial Intelligence to handle a large volume of data. However, these techniques focus on data view (images, sequences) and not on an expert's view. Noticing this issue, we propose a preprocessing that focuses on expert knowledge to improve malware semantic analysis and result interpretability. We propose a new preprocessing method which creates JSON reports for Portable Executable files. These reports gather features from both static and behavioral analysis, and incorporate packer signature detection, MITRE ATT\&CK and Malware Behavior Catalog (MBC) knowledge. The purpose of this preprocessing is to gather a semantic representation of binary files, understandable by malware analysts, and that can enhance AI models' explainability for malicious files analysis. Using this preprocessing to train a Large Language Model for Malware classification, we achieve a weighted-average F1-score of 0.94 on a complex dataset, representative of market reality.

CRJul 23, 2021
Malware Analysis with Artificial Intelligence and a Particular Attention on Results Interpretability

Benjamin Marais, Tony Quertier, Christophe Chesneau

Malware detection and analysis are active research subjects in cybersecurity over the last years. Indeed, the development of obfuscation techniques, as packing, for example, requires special attention to detect recent variants of malware. The usual detection methods do not necessarily provide tools to interpret the results. Therefore, we propose a model based on the transformation of binary files into grayscale image, which achieves an accuracy rate of 88%. Furthermore, the proposed model can determine if a sample is packed or encrypted with a precision of 85%. It allows us to analyze results and act appropriately. Also, by applying attention mechanisms on detection models, we have the possibility to identify which part of the files looks suspicious. This kind of tool should be very useful for data analysts, it compensates for the lack of interpretability of the common detection models, and it can help to understand why some malicious files are undetected.