Michele Carminati

h-index16

11papers

803citations

Novelty49%

AI Score40

Ranked #72,332 of 194,257 authors (top 37%)#1,731 in CR (top 26%)

11 Papers

3.3QUANT-PHFeb 16, 2025Code

Evaluating the Potential of Quantum Machine Learning in Cybersecurity: A Case-Study on PCA-based Intrusion Detection Systems

Armando Bellante, Tommaso Fioravanti, Michele Carminati et al.

Quantum computing promises to revolutionize our understanding of the limits of computation, and its implications in cryptography have long been evident. Today, cryptographers are actively devising post-quantum solutions to counter the threats posed by quantum-enabled adversaries. Meanwhile, quantum scientists are innovating quantum protocols to empower defenders. However, the broader impact of quantum computing and quantum machine learning (QML) on other cybersecurity domains still needs to be explored. In this work, we investigate the potential impact of QML on cybersecurity applications of traditional ML. First, we explore the potential advantages of quantum computing in machine learning problems specifically related to cybersecurity. Then, we describe a methodology to quantify the future impact of fault-tolerant QML algorithms on real-world problems. As a case study, we apply our approach to standard methods and datasets in network intrusion detection, one of the most studied applications of machine learning in cybersecurity. Our results provide insight into the conditions for obtaining a quantum advantage and the need for future quantum hardware and software advancements.

3.4CRJun 17

Lifecycle-Aware Dynamic Analysis for Secure ML Model Execution

Gabriele Digregorio, Marco Di Gennaro, Francesco Pastore et al.

The growing reliance on pre-trained Machine Learning (ML) models has introduced new attack surfaces. Recent vulnerabilities demonstrate that malicious behavior can be embedded within model artifacts, often bypassing existing defenses. Current model-scanning solutions primarily rely on static, format-specific rules or known attack signatures, which limit their ability to generalize across frameworks and to detect novel exploitation paths. In contrast, we propose a solution that focuses on the effects an attack has on the host system executing the model and builds on foundational intuitions about ML model execution. In particular, we observe that ML models operate within well-defined lifecycle phases and that, within each phase, interactions with the host system are highly structured and predictable. We translate these intuitions into Moat, a dynamic lifecycle-aware approach for securing ML model execution, and instantiate this design in Re-Moat, our reference implementation. We evaluate Re-Moat across multiple ML frameworks using 77,974 real-world model artifacts from the Hugging Face Hub, 31 Proofs-of-Concept (PoCs) from CVEs, and 334 models from a state-of-the-art dataset, and compare it against state-of-the-art model-scanning solutions. Our results show that our approach detects all evaluated attack classes while maintaining a close-to-zero false-positive rate, validating our intuitions and motivating dynamic analysis for securing ML model execution.

2.3AIApr 17, 2024

A Secure and Trustworthy Network Architecture for Federated Learning Healthcare Applications

Antonio Boiano, Marco Di Gennaro, Luca Barbieri et al.

Federated Learning (FL) has emerged as a promising approach for privacy-preserving machine learning, particularly in sensitive domains such as healthcare. In this context, the TRUSTroke project aims to leverage FL to assist clinicians in ischemic stroke prediction. This paper provides an overview of the TRUSTroke FL network infrastructure. The proposed architecture adopts a client-server model with a central Parameter Server (PS). We introduce a Docker-based design for the client nodes, offering a flexible solution for implementing FL processes in clinical settings. The impact of different communication protocols (HTTP or MQTT) on FL network operation is analyzed, with MQTT selected for its suitability in FL scenarios. A control plane to support the main operations required by FL processes is also proposed. The paper concludes with an analysis of security aspects of the FL architecture, addressing potential threats and proposing mitigation strategies to increase the trustworthiness level.

3.6CRMay 31, 2025Code

PackHero: A Scalable Graph-based Approach for Efficient Packer Identification

Marco Di Gennaro, Mario D'Onghia, Mario Polino et al.

Anti-analysis techniques, particularly packing, challenge malware analysts, making packer identification fundamental. Existing packer identifiers have significant limitations: signature-based methods lack flexibility and struggle against dynamic evasion, while Machine Learning approaches require extensive training data, limiting scalability and adaptability. Consequently, achieving accurate and adaptable packer identification remains an open problem. This paper presents PackHero, a scalable and efficient methodology for identifying packers using a novel static approach. PackHero employs a Graph Matching Network and clustering to match and group Call Graphs from programs packed with known packers. We evaluate our approach on a public dataset of malware and benign samples packed with various packers, demonstrating its effectiveness and scalability across varying sample sizes. PackHero achieves a macro-average F1-score of 93.7% with just 10 samples per packer, improving to 98.3% with 100 samples. Notably, PackHero requires fewer samples to achieve stable performance compared to other Machine Learning-based tools. Overall, PackHero matches the performance of State-of-the-art signature-based tools, outperforming them in handling Virtualization-based packers such as Themida/Winlicense, with a recall of 100%.

3.6CRMay 31, 2025

Amatriciana: Exploiting Temporal GNNs for Robust and Efficient Money Laundering Detection

Marco Di Gennaro, Francesco Panebianco, Marco Pianta et al.

Money laundering is a financial crime that poses a serious threat to financial integrity and social security. The growing number of transactions makes it necessary to use automatic tools that help law enforcement agencies detect such criminal activity. In this work, we present Amatriciana, a novel approach based on Graph Neural Networks to detect money launderers inside a graph of transactions by considering temporal information. Amatriciana uses the whole graph of transactions without splitting it into several time-based subgraphs, exploiting all relational information in the dataset. Our experiments on a public dataset reveal that the model can learn from a limited amount of data. Furthermore, when more data is available, the model outperforms other State-of-the-art approaches; in particular, Amatriciana decreases the number of False Positives (FPs) while detecting many launderers. In summary, Amatriciana achieves an F1 score of 0.76. In addition, it lowers the FPs by 55% with respect to other State-of-the-art models.

6.4CRSep 8, 2025

When Secure Isn't: Assessing the Security of Machine Learning Model Sharing

Gabriele Digregorio, Marco Di Gennaro, Stefano Zanero et al.

The rise of model-sharing through frameworks and dedicated hubs makes Machine Learning significantly more accessible. Despite their benefits, these tools expose users to underexplored security risks, while security awareness remains limited among both practitioners and developers. To enable a more security-conscious culture in Machine Learning model sharing, in this paper we evaluate the security posture of frameworks and hubs, assess whether security-oriented mechanisms offer real protection, and survey how users perceive the security narratives surrounding model sharing. Our evaluation shows that most frameworks and hubs address security risks partially at best, often by shifting responsibility to the user. More concerningly, our analysis of frameworks advertising security-oriented settings and complete model sharing uncovered six 0-day vulnerabilities enabling arbitrary code execution. Through this analysis, we debunk the misconceptions that the model-sharing problem is largely solved and that its security can be guaranteed by the file format used for sharing. As expected, our survey shows that the surrounding security narrative leads users to consider security-oriented settings as trustworthy, despite the weaknesses shown in this work. From this, we derive takeaways and suggestions to strengthen the security of model-sharing ecosystems.

3.6CRAug 1, 2025

LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks

Francesco Panebianco, Stefano Bonfanti, Francesco Trovò et al.

The generalization capabilities of Large Language Models (LLMs) have led to their widespread deployment across various applications. However, this increased adoption has introduced several security threats, notably in the forms of jailbreaking and data leakage attacks. Additionally, Retrieval Augmented Generation (RAG), while enhancing context-awareness in LLM responses, has inadvertently introduced vulnerabilities that can result in the leakage of sensitive information. Our contributions are twofold. First, we introduce a methodology to analyze historical interaction data from an LLM system, enabling the generation of usage maps categorized by topics (including adversarial interactions). This approach further provides forensic insights for tracking the evolution of jailbreaking attack patterns. Second, we propose LeakSealer, a model-agnostic framework that combines static analysis for forensic insights with dynamic defenses in a Human-In-The-Loop (HITL) pipeline. This technique identifies topic groups and detects anomalous patterns, allowing for proactive defense mechanisms. We empirically evaluate LeakSealer under two scenarios: (1) jailbreak attempts, employing a public benchmark dataset, and (2) PII leakage, supported by a curated dataset of labeled LLM interactions. In the static setting, LeakSealer achieves the highest precision and recall on the ToxicChat dataset when identifying prompt injection. In the dynamic setting, PII leakage detection achieves an AUPRC of $0.97$, significantly outperforming baselines such as Llama Guard.

3.6CRJun 12, 2025

Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation

Stefano Longari, Paolo Cerracchio, Michele Carminati et al.

The security of modern vehicles has become increasingly important, with the controller area network (CAN) bus serving as a critical communication backbone for various Electronic Control Units (ECUs). The absence of robust security measures in CAN, coupled with the increasing connectivity of vehicles, makes them susceptible to cyberattacks. While intrusion detection systems (IDSs) have been developed to counter such threats, they are not foolproof. Adversarial attacks, particularly evasion attacks, can manipulate inputs to bypass detection by IDSs. This paper extends our previous work by investigating the feasibility and impact of gradient-based adversarial attacks performed with different degrees of knowledge against automotive IDSs. We consider three scenarios: white-box (attacker with full system knowledge), grey-box (partial system knowledge), and the more realistic black-box (no knowledge of the IDS' internal workings or data). We evaluate the effectiveness of the proposed attacks against state-of-the-art IDSs on two publicly available datasets. Additionally, we study effect of the adversarial perturbation on the attack impact and evaluate real-time feasibility by precomputing evasive payloads for timed injection based on bus traffic. Our results demonstrate that, besides attacks being challenging due to the automotive domain constraints, their effectiveness is strongly dependent on the dataset quality, the target IDS, and the attacker's degree of knowledge.

3.6CRJun 9, 2025Code

TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems

Marco Di Gennaro, Giovanni De Lucia, Stefano Longari et al.

Federated Learning has emerged as a privacy-oriented alternative to centralized Machine Learning, enabling collaborative model training without direct data sharing. While extensively studied for neural networks, the security and privacy implications of tree-based models remain underexplored. This work introduces TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models. Our attack, carried out by a single client, exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients. We evaluate TimberStrike on State-of-the-Art federated gradient boosting implementations across multiple frameworks, including Flower, NVFlare, and FedTree, demonstrating their vulnerability to privacy breaches. On a publicly available stroke prediction dataset, TimberStrike consistently reconstructs between 73.05% and 95.63% of the target dataset across all implementations. We further analyze Differential Privacy, showing that while it partially mitigates the attack, it also significantly degrades model performance. Our findings highlight the need for privacy-preserving mechanisms specifically designed for tree-based Federated Learning systems, and we provide preliminary insights into their design.

3.6CRJun 3, 2025

How stealthy is stealthy? Studying the Efficacy of Black-Box Adversarial Attacks in the Real World

Francesco Panebianco, Mario D'Onghia, Stefano Zanero aand Michele Carminati

Deep learning systems, critical in domains like autonomous vehicles, are vulnerable to adversarial examples (crafted inputs designed to mislead classifiers). This study investigates black-box adversarial attacks in computer vision. This is a realistic scenario, where attackers have query-only access to the target model. Three properties are introduced to evaluate attack feasibility: robustness to compression, stealthiness to automatic detection, and stealthiness to human inspection. State-of-the-Art methods tend to prioritize one criterion at the expense of others. We propose ECLIPSE, a novel attack method employing Gaussian blurring on sampled gradients and a local surrogate model. Comprehensive experiments on a public dataset highlight ECLIPSE's advantages, demonstrating its contribution to the trade-off between the three properties.

2.3CRDec 28, 2024

An Anomaly Detection System Based on Generative Classifiers for Controller Area Network

Chunheng Zhao, Stefano Longari, Michele Carminati et al.

As electronic systems become increasingly complex and prevalent in modern vehicles, securing onboard networks is crucial, particularly as many of these systems are safety-critical. Researchers have demonstrated that modern vehicles are susceptible to various types of attacks, enabling attackers to gain control and compromise safety-critical electronic systems. Consequently, several Intrusion Detection Systems (IDSs) have been proposed in the literature to detect such cyber-attacks on vehicles. This paper introduces a novel generative classifier-based Intrusion Detection System (IDS) designed for anomaly detection in automotive networks, specifically focusing on the Controller Area Network (CAN). Leveraging variational Bayes, our proposed IDS utilizes a deep latent variable model to construct a causal graph for conditional probabilities. An auto-encoder architecture is utilized to build the classifier to estimate conditional probabilities, which contribute to the final prediction probabilities through Bayesian inference. Comparative evaluations against state-of-the-art IDSs on a public Car-hacking dataset highlight our proposed classifier's superior performance in improving detection accuracy and F1-score. The proposed IDS demonstrates its efficacy by outperforming existing models with limited training data, providing enhanced security assurance for automotive systems.