Fabio Roli

LG
h-index75
81papers
9,046citations
Novelty47%
AI Score59

81 Papers

CRMar 7, 2022Code
ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches

Maura Pintor, Daniele Angioni, Angelo Sotgiu et al.

Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding, and requires careful hyperparameter tuning, potentially leading to suboptimal robustness evaluations. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches, optimized to generalize across different models, and readily applicable to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations. We showcase the usefulness of this dataset by testing the effectiveness of the computed patches against 127 models. We conclude by discussing how our dataset could be used as a benchmark for robustness, and how our methodology can be generalized to other domains. We open source our dataset and evaluation code at https://github.com/pralab/ImageNet-Patch.

CRSep 19, 2023Code
Nebula: Self-Attention for Dynamic Malware Analysis

Dmitrijs Trizna, Luca Demetrio, Battista Biggio et al.

Dynamic analysis enables detecting Windows malware by executing programs in a controlled environment and logging their actions. Previous work has proposed training machine learning models, i.e., convolutional and long short-term memory networks, on homogeneous input features like runtime APIs to either detect or classify malware, neglecting other relevant information coming from heterogeneous data like network and file operations. To overcome these issues, we introduce Nebula, a versatile, self-attention Transformer-based neural architecture that generalizes across different behavioral representations and formats, combining diverse information from dynamic log reports. Nebula is composed by several components needed to tokenize, filter, normalize and encode data to feed the transformer architecture. We firstly perform a comprehensive ablation study to evaluate their impact on the performance of the whole system, highlighting which components can be used as-is, and which must be enriched with specific domain knowledge. We perform extensive experiments on both malware detection and classification tasks, using three datasets acquired from different dynamic analyses platforms, show that, on average, Nebula outperforms state-of-the-art models at low false positive rates, with a peak of 12% improvement. Moreover, we showcase how self-supervised learning pre-training matches the performance of fully-supervised models with only 20% of training data, and we inspect the output of Nebula through explainable AI techniques, pinpointing how attention is focusing on specific tokens correlated to malicious activities of malware families. To foster reproducibility, we open-source our findings and models at https://github.com/dtrizna/nebula.

LGSep 2, 2024Code
Adversarial Pruning: A Survey and Benchmark of Pruning Methods for Adversarial Robustness

Giorgio Piras, Maura Pintor, Ambra Demontis et al.

Recent work has proposed neural network pruning techniques to reduce the size of a network while preserving robustness against adversarial examples, i.e., well-crafted inputs inducing a misclassification. These methods, which we refer to as adversarial pruning methods, involve complex and articulated designs, making it difficult to analyze the differences and establish a fair and accurate comparison. In this work, we overcome these issues by surveying current adversarial pruning methods and proposing a novel taxonomy to categorize them based on two main dimensions: the pipeline, defining when to prune; and the specifics, defining how to prune. We then highlight the limitations of current empirical analyses and propose a novel, fair evaluation benchmark to address them. We finally conduct an empirical re-evaluation of current adversarial pruning methods and discuss the results, highlighting the shared traits of top-performing adversarial pruning methods, as well as common issues. We welcome contributions in our publicly-available benchmark at https://github.com/pralab/AdversarialPruningBenchmark

LGJun 1, 2022
Support Vector Machines under Adversarial Label Contamination

Huang Xiao, Battista Biggio, Blaine Nelson et al.

Machine learning algorithms are increasingly being applied in security-related tasks such as spam and malware detection, although their security properties against deliberate attacks have not yet been widely understood. Intelligent and adaptive attackers may indeed exploit specific vulnerabilities exposed by machine learning techniques to violate system security. Being robust to adversarial data manipulation is thus an important, additional requirement for machine learning algorithms to successfully operate in adversarial settings. In this work, we evaluate the security of Support Vector Machines (SVMs) to well-crafted, adversarial label noise attacks. In particular, we consider an attacker that aims to maximize the SVM's classification error by flipping a number of labels in the training data. We formalize a corresponding optimal attack strategy, and solve it by means of heuristic approaches to keep the computational complexity tractable. We report an extensive experimental analysis on the effectiveness of the considered attacks against linear and non-linear SVMs, both on synthetic and real-world datasets. We finally argue that our approach can also provide useful insights for developing more secure SVM learning algorithms, and also novel techniques in a number of related research areas, such as semi-supervised and active learning.

LGMay 4, 2022
Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning

Antonio Emanuele Cinà, Kathrin Grosse, Ambra Demontis et al.

The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones, assuming that it is sufficiently representative of the data that will be encountered at test time. This assumption is challenged by the threat of poisoning, an attack that manipulates the training data to compromise the model's performance at test time. Although poisoning has been acknowledged as a relevant threat in industry applications, and a variety of different attacks and defenses have been proposed so far, a complete systematization and critical review of the field is still missing. In this survey, we provide a comprehensive systematization of poisoning attacks and defenses in machine learning, reviewing more than 100 papers published in the field in the last 15 years. We start by categorizing the current threat models and attacks, and then organize existing defenses accordingly. While we focus mostly on computer-vision applications, we argue that our systematization also encompasses state-of-the-art attacks and defenses for other data modalities. Finally, we discuss existing resources for research in poisoning, and shed light on the current limitations and open research questions in this research field.

LGJul 11, 2024Code
HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks

Raffaele Mura, Giuseppe Floris, Luca Scionis et al.

Gradient-based attacks are a primary tool to evaluate robustness of machine-learning models. However, many attacks tend to provide overly-optimistic evaluations as they use fixed loss functions, optimizers, step-size schedulers, and default hyperparameters. In this work, we tackle these limitations by proposing a parametric variation of the well-known fast minimum-norm attack algorithm, whose loss, optimizer, step-size scheduler, and hyperparameters can be dynamically adjusted. We re-evaluate 12 robust models, showing that our attack finds smaller adversarial perturbations without requiring any additional tuning. This also enables reporting adversarial robustness as a function of the perturbation budget, providing a more complete evaluation than that offered by fixed-budget attacks, while remaining efficient. We release our open-source code at https://github.com/pralab/HO-FMN.

CRApr 12, 2022
Machine Learning Security against Data Poisoning: Are We There Yet?

Antonio Emanuele Cinà, Kathrin Grosse, Ambra Demontis et al.

The recent success of machine learning (ML) has been fueled by the increasing availability of computing power and large amounts of data in many different applications. However, the trustworthiness of the resulting models can be compromised when such data is maliciously manipulated to mislead the learning process. In this article, we first review poisoning attacks that compromise the training data used to learn ML models, including attacks that aim to reduce the overall performance, manipulate the predictions on specific test samples, and even implant backdoors in the model. We then discuss how to mitigate these attacks using basic security principles, or by deploying ML-oriented defensive mechanisms. We conclude our article by formulating some relevant open challenges which are hindering the development of testing methods and benchmarks suitable for assessing and improving the trustworthiness of ML models against data poisoning attacks

CRMar 14, 2022
Energy-Latency Attacks via Sponge Poisoning

Antonio Emanuele Cinà, Ambra Demontis, Battista Biggio et al.

Sponge examples are test-time inputs optimized to increase energy consumption and prediction latency of deep networks deployed on hardware accelerators. By increasing the fraction of neurons activated during classification, these attacks reduce sparsity in network activation patterns, worsening the performance of hardware accelerators. In this work, we present a novel training-time attack, named sponge poisoning, which aims to worsen energy consumption and prediction latency of neural networks on any test input without affecting classification accuracy. To stage this attack, we assume that the attacker can control only a few model updates during training -- a likely scenario, e.g., when model training is outsourced to an untrusted third party or distributed via federated learning. Our extensive experiments on image classification tasks show that sponge poisoning is effective, and that fine-tuning poisoned models to repair them poses prohibitive costs for most users, highlighting that tackling sponge poisoning remains an open issue.

LGFeb 6, 2023
Dropout Injection at Test Time for Post Hoc Uncertainty Quantification in Neural Networks

Emanuele Ledda, Giorgio Fumera, Fabio Roli

Among Bayesian methods, Monte-Carlo dropout provides principled tools for evaluating the epistemic uncertainty of neural networks. Its popularity recently led to seminal works that proposed activating the dropout layers only during inference for evaluating uncertainty. This approach, which we call dropout injection, provides clear benefits over its traditional counterpart (which we call embedded dropout) since it allows one to obtain a post hoc uncertainty measure for any existing network previously trained without dropout, avoiding an additional, time-consuming training process. Unfortunately, no previous work compared injected and embedded dropout; therefore, we provide the first thorough investigation, focusing on regression problems. The main contribution of our work is to provide guidelines on the effective use of injected dropout so that it can be a practical alternative to the current use of embedded dropout. In particular, we show that its effectiveness strongly relies on a suitable scaling of the corresponding uncertainty measure, and we discuss the trade-off between negative log-likelihood and calibration error as a function of the scale factor. Experimental results on UCI data sets and crowd counting benchmarks support our claim that dropout injection can effectively behave as a competitive post hoc uncertainty quantification technique.

AIJul 9, 2024
Countermeasures Against Adversarial Examples in Radio Signal Classification

Lu Zhang, Sangarapillai Lambotharan, Gan Zheng et al.

Deep learning algorithms have been shown to be powerful in many communication network design problems, including that in automatic modulation classification. However, they are vulnerable to carefully crafted attacks called adversarial examples. Hence, the reliance of wireless networks on deep learning algorithms poses a serious threat to the security and operation of wireless networks. In this letter, we propose for the first time a countermeasure against adversarial examples in modulation classification. Our countermeasure is based on a neural rejection technique, augmented by label smoothing and Gaussian noise injection, that allows to detect and reject adversarial examples with high accuracy. Our results demonstrate that the proposed countermeasure can protect deep-learning based modulation classification systems against adversarial examples.

CVSep 20, 2023
3D Face Reconstruction: the Road to Forensics

Simone Maurizio La Cava, Giulia Orrù, Martin Drahansky et al.

3D face reconstruction algorithms from images and videos are applied to many fields, from plastic surgery to the entertainment sector, thanks to their advantageous features. However, when looking at forensic applications, 3D face reconstruction must observe strict requirements that still make its possible role in bringing evidence to a lawsuit unclear. An extensive investigation of the constraints, potential, and limits of its application in forensics is still missing. Shedding some light on this matter is the goal of the present survey, which starts by clarifying the relation between forensic applications and biometrics, with a focus on face recognition. Therefore, it provides an analysis of the achievements of 3D face reconstruction algorithms from surveillance videos and mugshot images and discusses the current obstacles that separate 3D face reconstruction from an active role in forensic applications. Finally, it examines the underlying data sets, with their advantages and limitations, while proposing alternatives that could substitute or complement them.

AIJul 9, 2024
A Hybrid Training-time and Run-time Defense Against Adversarial Attacks in Modulation Classification

Lu Zhang, Sangarapillai Lambotharan, Gan Zheng et al.

Motivated by the superior performance of deep learning in many applications including computer vision and natural language processing, several recent studies have focused on applying deep neural network for devising future generations of wireless networks. However, several recent works have pointed out that imperceptible and carefully designed adversarial examples (attacks) can significantly deteriorate the classification accuracy. In this paper, we investigate a defense mechanism based on both training-time and run-time defense techniques for protecting machine learning-based radio signal (modulation) classification against adversarial attacks. The training-time defense consists of adversarial training and label smoothing, while the run-time defense employs a support vector machine-based neural rejection (NR). Considering a white-box scenario and real datasets, we demonstrate that our proposed techniques outperform existing state-of-the-art technologies.

LGJul 1, 2023
Minimizing Energy Consumption of Deep Learning Models by Energy-Aware Training

Dario Lazzaro, Antonio Emanuele Cinà, Maura Pintor et al.

Deep learning models undergo a significant increase in the number of parameters they possess, leading to the execution of a larger number of operations during inference. This expansion significantly contributes to higher energy consumption and prediction latency. In this work, we propose EAT, a gradient-based algorithm that aims to reduce energy consumption during model training. To this end, we leverage a differentiable approximation of the $\ell_0$ norm, and use it as a sparse penalty over the training loss. Through our experimental analysis conducted on three datasets and two deep neural networks, we demonstrate that our energy-aware training algorithm EAT is able to train networks with a better trade-off between classification performance and energy efficiency.

CRJul 12, 2022
Practical Attacks on Machine Learning: A Case Study on Adversarial Windows Malware

Luca Demetrio, Battista Biggio, Fabio Roli

While machine learning is vulnerable to adversarial examples, it still lacks systematic procedures and tools for evaluating its security in different application contexts. In this article, we discuss how to develop automated and scalable security evaluations of machine learning using practical attacks, reporting a use case on Windows malware detection.

CVSep 13, 2023
Hardening RGB-D Object Recognition Systems against Adversarial Patch Attacks

Yang Zheng, Luca Demetrio, Antonio Emanuele Cinà et al.

RGB-D object recognition systems improve their predictive performances by fusing color and depth information, outperforming neural network architectures that rely solely on colors. While RGB-D systems are expected to be more robust to adversarial examples than RGB-only systems, they have also been proven to be highly vulnerable. Their robustness is similar even when the adversarial examples are generated by altering only the original images' colors. Different works highlighted the vulnerability of RGB-D systems; however, there is a lacking of technical explanations for this weakness. Hence, in our work, we bridge this gap by investigating the learned deep representation of RGB-D systems, discovering that color features make the function learned by the network more complex and, thus, more sensitive to small perturbations. To mitigate this problem, we propose a defense based on a detection mechanism that makes RGB-D systems more robust against adversarial examples. We empirically show that this defense improves the performances of RGB-D systems against adversarial examples even when they are computed ad-hoc to circumvent this detection mechanism, and that is also more effective than adversarial training.

CVSep 19, 2023
Adversarial Attacks Against Uncertainty Quantification

Emanuele Ledda, Daniele Angioni, Giorgio Piras et al.

Machine-learning models can be fooled by adversarial examples, i.e., carefully-crafted input perturbations that force models to output wrong predictions. While uncertainty quantification has been recently proposed to detect adversarial inputs, under the assumption that such attacks exhibit a higher prediction uncertainty than pristine data, it has been shown that adaptive attacks specifically aimed at reducing also the uncertainty estimate can easily bypass this defense mechanism. In this work, we focus on a different adversarial scenario in which the attacker is still interested in manipulating the uncertainty estimate, but regardless of the correctness of the prediction; in particular, the goal is to undermine the use of machine-learning models when their outputs are consumed by a downstream module or by a human operator. Following such direction, we: \textit{(i)} design a threat model for attacks targeting uncertainty quantification; \textit{(ii)} devise different attack strategies on conceptually different UQ techniques spanning for both classification and semantic segmentation problems; \textit{(iii)} conduct a first complete and extensive analysis to compare the differences between some of the most employed UQ approaches under attack. Our extensive experimental analysis shows that our attacks are more effective in manipulating uncertainty quantification measures than attacks aimed to also induce misclassifications.

LGDec 12, 2022
Security of Deep Reinforcement Learning for Autonomous Driving: A Survey

Ambra Demontis, Srishti Gupta, Maura Pintor et al.

Reinforcement learning (RL) enables agents to learn optimal behaviors through interaction with their environment and has been increasingly deployed in safety-critical applications, including autonomous driving. Despite its promise, RL is susceptible to attacks designed either to compromise policy learning or to induce erroneous decisions by trained agents. Although the literature on RL security has grown rapidly and several surveys exist, existing categorizations often fall short in guiding the selection of appropriate defenses for specific systems. In this work, we present a comprehensive survey of 86 recent studies on RL security, addressing these limitations by systematically categorizing attacks and defenses according to defined threat models and single- versus multi-agent settings. Furthermore, we examine the relevance and applicability of state-of-the-art attacks and defense mechanisms within the context of autonomous driving, providing insights to inform the design of robust RL systems.

CRAug 14, 2024
Sonic: Fast and Transferable Data Poisoning on Clustering Algorithms

Francesco Villani, Dario Lazzaro, Antonio Emanuele Cinà et al.

Data poisoning attacks on clustering algorithms have received limited attention, with existing methods struggling to scale efficiently as dataset sizes and feature counts increase. These attacks typically require re-clustering the entire dataset multiple times to generate predictions and assess the attacker's objectives, significantly hindering their scalability. This paper addresses these limitations by proposing Sonic, a novel genetic data poisoning attack that leverages incremental and scalable clustering algorithms, e.g., FISHDBC, as surrogates to accelerate poisoning attacks against graph-based and density-based clustering methods, such as HDBSCAN. We empirically demonstrate the effectiveness and efficiency of Sonic in poisoning the target clustering algorithms. We then conduct a comprehensive analysis of the factors affecting the scalability and transferability of poisoning attacks against clustering algorithms, and we conclude by examining the robustness of hyperparameters in our attack strategy Sonic.

AINov 11, 2025
SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models

Giorgio Piras, Raffaele Mura, Fabio Brau et al.

Refusal refers to the functional behavior enabling safety-aligned language models to reject harmful or unethical prompts. Following the growing scientific interest in mechanistic interpretability, recent work encoded refusal behavior as a single direction in the model's latent space; e.g., computed as the difference between the centroids of harmful and harmless prompt representations. However, emerging evidence suggests that concepts in LLMs often appear to be encoded as a low-dimensional manifold embedded in the high-dimensional latent space. Motivated by these findings, we propose a novel method leveraging Self-Organizing Maps (SOMs) to extract multiple refusal directions. To this end, we first prove that SOMs generalize the prior work's difference-in-means technique. We then train SOMs on harmful prompt representations to identify multiple neurons. By subtracting the centroid of harmless representations from each neuron, we derive a set of multiple directions expressing the refusal concept. We validate our method on an extensive experimental setup, demonstrating that ablating multiple directions from models' internals outperforms not only the single-direction baseline but also specialized jailbreak algorithms, leading to an effective suppression of refusal. Finally, we conclude by analyzing the mechanistic implications of our approach.

AIMay 20
Latent-space Attacks for Refusal Evasion in Language Models

Giorgio Piras, Raffaele Mura, Fabio Brau et al.

Safety-aligned language models are trained to refuse harmful requests, yet refusal behavior can be suppressed by steering their internal representations. Existing methods do so by ablating a refusal direction from model activations, aiming to remove refusal from the model's residual stream. Despite their empirical success, these methods lack a principled account of the latent-space transformation they induce and why it suppresses refusal. In this work, we recast refusal suppression as a latent-space evasion attack against linear probes trained to separate refused from answered prompts. Under this view, prior work's difference-in-means direction naturally defines such a probe, and its ablation is exactly a projection onto its decision boundary, i.e., a minimum-confidence evasion attack. This perspective not only explains the empirical success of prior work but also admits a key limitation: evasion stops at the decision boundary, motivating the need to push representations further into the compliant region, i.e., where the model answers. We leverage this by proposing a Controlled Latent-space Evasion attack that projects representations past the boundary with an optimized confidence. We achieve state-of-the-art attack success rate across 15 instruction-tuned, multimodal, and reasoning models, outperforming existing refusal-ablation baselines and specialized jailbreak attacks.

CRApr 7
Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization

Igor Maljkovic, Maria Rosaria Briglia, Iacopo Masi et al.

Vision-Language Models (VLMs) have become essential for tasks such as image synthesis, captioning, and retrieval by aligning textual and visual information in a shared embedding space. Yet, this flexibility also makes them vulnerable to malicious prompts designed to produce unsafe content, raising critical safety concerns. Existing defenses either rely on blacklist filters, which are easily circumvented, or on heavy classifier-based systems, both of which are costly and fragile under embedding-level attacks. We address these challenges with two complementary components: Hyperbolic Prompt Espial (HyPE) and Hyperbolic Prompt Sanitization (HyPS). HyPE is a lightweight anomaly detector that leverages the structured geometry of hyperbolic space to model benign prompts and detect harmful ones as outliers. HyPS builds on this detection by applying explainable attribution methods to identify and selectively modify harmful words, neutralizing unsafe intent while preserving the original semantics of user prompts. Through extensive experiments across multiple datasets and adversarial scenarios, we prove that our framework consistently outperforms prior defenses in both detection accuracy and robustness. Together, HyPE and HyPS offer an efficient, interpretable, and resilient approach to safeguarding VLMs against malicious prompt misuse.

CRFeb 28, 2024Code
Robust Synthetic Data-Driven Detection of Living-Off-the-Land Reverse Shells

Dmitrijs Trizna, Luca Demetrio, Battista Biggio et al.

Living-off-the-land (LOTL) techniques pose a significant challenge to security operations, exploiting legitimate tools to execute malicious commands that evade traditional detection methods. To address this, we present a robust augmentation framework for cyber defense systems as Security Information and Event Management (SIEM) solutions, enabling the detection of LOTL attacks such as reverse shells through machine learning. Leveraging real-world threat intelligence and adversarial training, our framework synthesizes diverse malicious datasets while preserving the variability of legitimate activity, ensuring high accuracy and low false-positive rates. We validate our approach through extensive experiments on enterprise-scale datasets, achieving a 90\% improvement in detection rates over non-augmented baselines at an industry-grade False Positive Rate (FPR) of $10^{-5}$. We define black-box data-driven attacks that successfully evade unprotected models, and develop defenses to mitigate them, producing adversarially robust variants of ML models. Ethical considerations are central to this work; we discuss safeguards for synthetic data generation and the responsible release of pre-trained models across four best performing architectures, including both adversarially and regularly trained variants: https://huggingface.co/dtrizna/quasarnix. Furthermore, we provide a malicious LOTL dataset containing over 1 million augmented attack variants to enable reproducible research and community collaboration: https://huggingface.co/datasets/dtrizna/QuasarNix. This work offers a reproducible, scalable, and production-ready defense against evolving LOTL threats.

LGJun 18, 2021Code
Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples

Maura Pintor, Luca Demetrio, Angelo Sotgiu et al.

Evaluating robustness of machine-learning models to adversarial examples is a challenging problem. Many defenses have been shown to provide a false sense of robustness by causing gradient-based attacks to fail, and they have been broken under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic manner. In this work, we overcome these limitations by: (i) categorizing attack failures based on how they affect the optimization of gradient-based attacks, while also unveiling two novel failures affecting many popular attack implementations and past evaluations; (ii) proposing six novel indicators of failure, to automatically detect the presence of such failures in the attack optimization process; and (iii) suggesting a systematic protocol to apply the corresponding fixes. Our extensive experimental analysis, involving more than 15 models in 3 distinct application domains, shows that our indicators of failure can be used to debug and improve current adversarial robustness evaluations, thereby providing a first concrete step towards automatizing and systematizing them. Our open-source code is available at: https://github.com/pralab/IndicatorsOfAttackFailure.

CRAug 17, 2020Code
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection

Luca Demetrio, Scott E. Coull, Battista Biggio et al.

Recent work has shown that adversarial Windows malware samples - referred to as adversarial EXEmples in this paper - can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To preserve malicious functionality, previous attacks either add bytes to existing non-functional areas of the file, potentially limiting their effectiveness, or require running computationally-demanding validation steps to discard malware variants that do not correctly execute in sandbox environments. In this work, we overcome these limitations by developing a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks based on practical, functionality-preserving manipulations to the Windows Portable Executable (PE) file format. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. Our experimental results show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better trade-off in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks. To facilitate reproducibility of our findings, we open source our framework and all the corresponding attack implementations as part of the secml-malware Python library. We conclude this work by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts directly into the learning process.

LGNov 25, 2018Code
Poisoning Behavioral Malware Clustering

Battista Biggio, Konrad Rieck, Davide Ariu et al.

Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems. However, the suitability of clustering algorithms for security-sensitive settings has been recently questioned by showing that they can be significantly compromised if an attacker can exercise some control over the input data. In this paper, we revisit this problem by focusing on behavioral malware clustering approaches, and investigate whether and to what extent an attacker may be able to subvert these approaches through a careful injection of samples with poisoning behavior. To this end, we present a case study on Malheur, an open-source tool for behavioral malware clustering. Our experiments not only demonstrate that this tool is vulnerable to poisoning attacks, but also that it can be significantly compromised even if the attacker can only inject a very small percentage of attacks into the input data. As a remedy, we discuss possible countermeasures and highlight the need for more secure clustering algorithms.

LGApr 30, 2024
AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

Antonio Emanuele Cinà, Jérôme Rony, Maura Pintor et al.

Adversarial examples are typically optimized with gradient-based attacks. While novel attacks are continuously proposed, each is shown to outperform its predecessors using different experimental setups, hyperparameter settings, and number of forward and backward calls to the target models. This provides overly-optimistic and even biased evaluations that may unfairly favor one particular attack over the others. In this work, we aim to overcome these limitations by proposing AttackBench, i.e., the first evaluation framework that enables a fair comparison among different attacks. To this end, we first propose a categorization of gradient-based attacks, identifying their main components and differences. We then introduce our framework, which evaluates their effectiveness and efficiency. We measure these characteristics by (i) defining an optimality metric that quantifies how close an attack is to the optimal solution, and (ii) limiting the number of forward and backward queries to the model, such that all attacks are compared within a given maximum query budget. Our extensive experimental analysis compares more than $100$ attack implementations with a total of over $800$ different configurations against CIFAR-10 and ImageNet models, highlighting that only very few attacks outperform all the competing approaches. Within this analysis, we shed light on several implementation issues that prevent many attacks from finding better solutions or running at all. We release AttackBench as a publicly-available benchmark, aiming to continuously update it to include and evaluate novel gradient-based attacks for optimizing adversarial examples.

LGJun 13, 2025
Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT Devices

Lu Zhang, Sangarapillai Lambotharan, Gan Zheng et al.

Due to great success of transformers in many applications such as natural language processing and computer vision, transformers have been successfully applied in automatic modulation classification. We have shown that transformer-based radio signal classification is vulnerable to imperceptible and carefully crafted attacks called adversarial examples. Therefore, we propose a defense system against adversarial examples in transformer-based modulation classifications. Considering the need for computationally efficient architecture particularly for Internet of Things (IoT)-based applications or operation of devices in environment where power supply is limited, we propose a compact transformer for modulation classification. The advantages of robust training such as adversarial training in transformers may not be attainable in compact transformers. By demonstrating this, we propose a novel compact transformer that can enhance robustness in the presence of adversarial attacks. The new method is aimed at transferring the adversarial attention map from the robustly trained large transformer to a compact transformer. The proposed method outperforms the state-of-the-art techniques for the considered white-box scenarios including fast gradient method and projected gradient descent attacks. We have provided reasoning of the underlying working mechanisms and investigated the transferability of the adversarial examples between different architectures. The proposed method has the potential to protect the transformer from the transferability of adversarial examples.

CRMay 23, 2024
SLIFER: Investigating Performance and Robustness of Malware Detection Pipelines

Andrea Ponte, Dmitrijs Trizna, Luca Demetrio et al.

As a result of decades of research, Windows malware detection is approached through a plethora of techniques. However, there is an ongoing mismatch between academia -- which pursues an optimal performances in terms of detection rate and low false alarms -- and the requirements of real-world scenarios. In particular, academia focuses on combining static and dynamic analysis within a single or ensemble of models, falling into several pitfalls like (i) firing dynamic analysis without considering the computational burden it requires; (ii) discarding impossible-to-analyze samples; and (iii) analyzing robustness against adversarial attacks without considering that malware detectors are complemented with more non-machine-learning components. Thus, in this paper we bridge these gaps, by investigating the properties of malware detectors built with multiple and different types of analysis. To do so, we develop SLIFER, a Windows malware detection pipeline sequentially leveraging both static and dynamic analysis, interrupting computations as soon as one module triggers an alarm, requiring dynamic analysis only when needed. Contrary to the state of the art, we investigate how to deal with samples that impede analyzes, showing how much they impact performances, concluding that it is better to flag them as legitimate to not drastically increase false alarms. Lastly, we perform a robustness evaluation of SLIFER. Counter-intuitively, the injection of new content is either blocked more by signatures than dynamic analysis, due to byte artifacts created by the attack, or it is able to avoid detection from signatures, as they rely on constraints on file size disrupted by attacks. As far as we know, we are the first to investigate the properties of sequential malware detectors, shedding light on their behavior in real production environment.

CRDec 24, 2024
SoK: On the Offensive Potential of AI

Saskia Laura Schröer, Giovanni Apruzzese, Soheil Human et al.

Our society increasingly benefits from Artificial Intelligence (AI). Unfortunately, more and more evidence shows that AI is also used for offensive purposes. Prior works have revealed various examples of use cases in which the deployment of AI can lead to violation of security and privacy objectives. No extant work, however, has been able to draw a holistic picture of the offensive potential of AI. In this SoK paper we seek to lay the ground for a systematic analysis of the heterogeneous capabilities of offensive AI. In particular we (i) account for AI risks to both humans and systems while (ii) consolidating and distilling knowledge from academic literature, expert opinions, industrial venues, as well as laypeople -- all of which being valuable sources of information on offensive AI. To enable alignment of such diverse sources of knowledge, we devise a common set of criteria reflecting essential technological factors related to offensive AI. With the help of such criteria, we systematically analyze: 95 research papers; 38 InfoSec briefings (from, e.g., BlackHat); the responses of a user study (N=549) entailing individuals with diverse backgrounds and expertise; and the opinion of 12 experts. Our contributions not only reveal concerning ways (some of which overlooked by prior work) in which AI can be offensively used today, but also represent a foothold to address this threat in the years to come.

LGFeb 27, 2024
Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates

Daniele Angioni, Luca Demetrio, Maura Pintor et al.

Machine-learning models demand periodic updates to improve their average accuracy, exploiting novel architectures and additional data. However, a newly updated model may commit mistakes the previous model did not make. Such misclassifications are referred to as negative flips, experienced by users as a regression of performance. In this work, we show that this problem also affects robustness to adversarial examples, hindering the development of secure model update practices. In particular, when updating a model to improve its adversarial robustness, previously ineffective adversarial attacks on some inputs may become successful, causing a regression in the perceived security of the system. We propose a novel technique, named robustness-congruent adversarial training, to address this issue. It amounts to fine-tuning a model with adversarial training, while constraining it to retain higher robustness on the samples for which no adversarial example was found before the update. We show that our algorithm and, more generally, learning with non-regression constraints, provides a theoretically-grounded framework to train consistent estimators. Our experiments on robust models for computer vision confirm that both accuracy and robustness, even if improved after model update, can be affected by negative flips, and our robustness-congruent adversarial training can mitigate the problem, outperforming competing baseline methods.

LGMay 23, 2024
A New Formulation for Zeroth-Order Optimization of Adversarial EXEmples in Malware Detection

Marco Rando, Luca Demetrio, Lorenzo Rosasco et al.

Machine learning malware detectors are vulnerable to adversarial EXEmples, i.e. carefully-crafted Windows programs tailored to evade detection. Unlike other adversarial problems, attacks in this context must be functionality-preserving, a constraint which is challenging to address. As a consequence heuristic algorithms are typically used, that inject new content, either randomly-picked or harvested from legitimate programs. In this paper, we show how learning malware detectors can be cast within a zeroth-order optimization framework which allows to incorporate functionality-preserving manipulations. This permits the deployment of sound and efficient gradient-free optimization algorithms, which come with theoretical guarantees and allow for minimal hyper-parameters tuning. As a by-product, we propose and study ZEXE, a novel zero-order attack against Windows malware detection. Compared to state-of-the-art techniques, ZEXE provides drastic improvement in the evasion rate, while reducing to less than one third the size of the injected content.

LGJun 13, 2025
A Neural Rejection System Against Universal Adversarial Perturbations in Radio Signal Classification

Lu Zhang, Sangarapillai Lambotharan, Gan Zheng et al.

Advantages of deep learning over traditional methods have been demonstrated for radio signal classification in the recent years. However, various researchers have discovered that even a small but intentional feature perturbation known as adversarial examples can significantly deteriorate the performance of the deep learning based radio signal classification. Among various kinds of adversarial examples, universal adversarial perturbation has gained considerable attention due to its feature of being data independent, hence as a practical strategy to fool the radio signal classification with a high success rate. Therefore, in this paper, we investigate a defense system called neural rejection system to propose against universal adversarial perturbations, and evaluate its performance by generating white-box universal adversarial perturbations. We show that the proposed neural rejection system is able to defend universal adversarial perturbations with significantly higher accuracy than the undefended deep neural network.

LGJul 24, 2025
Regression-aware Continual Learning for Android Malware Detection

Daniele Ghiani, Daniele Angioni, Giorgio Piras et al.

Malware evolves rapidly, forcing machine learning (ML)-based detectors to adapt continuously. With antivirus vendors processing hundreds of thousands of new samples daily, datasets can grow to billions of examples, making full retraining impractical. Continual learning (CL) has emerged as a scalable alternative, enabling incremental updates without full data access while mitigating catastrophic forgetting. In this work, we analyze a critical yet overlooked issue in this context: security regression. Unlike forgetting, which manifests as a general performance drop on previously seen data, security regression captures harmful prediction changes at the sample level, such as a malware sample that was once correctly detected but evades detection after a model update. Although often overlooked, regressions pose serious risks in security-critical applications, as the silent reintroduction of previously detected threats in the system may undermine users' trust in the whole updating process. To address this issue, we formalize and quantify security regression in CL-based malware detectors and propose a regression-aware penalty to mitigate it. Specifically, we adapt Positive Congruent Training (PCT) to the CL setting, preserving prior predictive behavior in a model-agnostic manner. Experiments on the ELSA, Tesseract, and AZ-Class datasets show that our method effectively reduces regression across different CL scenarios while maintaining strong detection performance over time.

LGJun 13, 2025
Vision Transformer with Adversarial Indicator Token against Adversarial Attacks in Radio Signal Classifications

Lu Zhang, Sangarapillai Lambotharan, Gan Zheng et al.

The remarkable success of transformers across various fields such as natural language processing and computer vision has paved the way for their applications in automatic modulation classification, a critical component in the communication systems of Internet of Things (IoT) devices. However, it has been observed that transformer-based classification of radio signals is susceptible to subtle yet sophisticated adversarial attacks. To address this issue, we have developed a defensive strategy for transformer-based modulation classification systems to counter such adversarial attacks. In this paper, we propose a novel vision transformer (ViT) architecture by introducing a new concept known as adversarial indicator (AdvI) token to detect adversarial attacks. To the best of our knowledge, this is the first work to propose an AdvI token in ViT to defend against adversarial attacks. Integrating an adversarial training method with a detection mechanism using AdvI token, we combine a training time defense and running time defense in a unified neural network model, which reduces architectural complexity of the system compared to detecting adversarial perturbations using separate models. We investigate into the operational principles of our method by examining the attention mechanism. We show the proposed AdvI token acts as a crucial element within the ViT, influencing attention weights and thereby highlighting regions or features in the input data that are potentially suspicious or anomalous. Through experimental results, we demonstrate that our approach surpasses several competitive methods in handling white-box attack scenarios, including those utilizing the fast gradient method, projected gradient descent attacks and basic iterative method.

CVDec 13, 2024
Robust image classification with multi-modal large language models

Francesco Villani, Igor Maljkovic, Dario Lazzaro et al.

Deep Neural Networks are vulnerable to adversarial examples, i.e., carefully crafted input samples that can cause models to make incorrect predictions with high confidence. To mitigate these vulnerabilities, adversarial training and detection-based defenses have been proposed to strengthen models in advance. However, most of these approaches focus on a single data modality, overlooking the relationships between visual patterns and textual descriptions of the input. In this paper, we propose a novel defense, MultiShield, designed to combine and complement these defenses with multi-modal information to further enhance their robustness. MultiShield leverages multi-modal large language models to detect adversarial examples and abstain from uncertain classifications when there is no alignment between textual and visual representations of the input. Extensive evaluations on CIFAR-10 and ImageNet datasets, using robust and non-robust image classification models, demonstrate that MultiShield can be easily integrated to detect and reject adversarial examples, outperforming the original defenses.

LGOct 29, 2024
On the Robustness of Adversarial Training Against Uncertainty Attacks

Emanuele Ledda, Giovanni Scodeller, Daniele Angioni et al.

In learning problems, the noise inherent to the task at hand hinders the possibility to infer without a certain degree of uncertainty. Quantifying this uncertainty, regardless of its wide use, assumes high relevance for security-sensitive applications. Within these scenarios, it becomes fundamental to guarantee good (i.e., trustworthy) uncertainty measures, which downstream modules can securely employ to drive the final decision-making process. However, an attacker may be interested in forcing the system to produce either (i) highly uncertain outputs jeopardizing the system's availability or (ii) low uncertainty estimates, making the system accept uncertain samples that would instead require a careful inspection (e.g., human intervention). Therefore, it becomes fundamental to understand how to obtain robust uncertainty estimates against these kinds of attacks. In this work, we reveal both empirically and theoretically that defending against adversarial examples, i.e., carefully perturbed samples that cause misclassification, additionally guarantees a more secure, trustworthy uncertainty estimate under common attack scenarios without the need for an ad-hoc defense strategy. To support our claims, we evaluate multiple adversarial-robust models from the publicly available benchmark RobustBench on the CIFAR-10 and ImageNet datasets.

LGDec 16, 2025
Out-of-Distribution Detection for Continual Learning: Design Principles and Benchmarking

Srishti Gupta, Riccardo Balia, Daniele Angioni et al.

Recent years have witnessed significant progress in the development of machine learning models across a wide range of fields, fueled by increased computational resources, large-scale datasets, and the rise of deep learning architectures. From malware detection to enabling autonomous navigation, modern machine learning systems have demonstrated remarkable capabilities. However, as these models are deployed in ever-changing real-world scenarios, their ability to remain reliable and adaptive over time becomes increasingly important. For example, in the real world, new malware families are continuously developed, whereas autonomous driving cars are employed in many different cities and weather conditions. Models trained in fixed settings can not respond effectively to novel conditions encountered post-deployment. In fact, most machine learning models are still developed under the assumption that training and test data are independent and identically distributed (i.i.d.), i.e., sampled from the same underlying (unknown) distribution. While this assumption simplifies model development and evaluation, it does not hold in many real-world applications, where data changes over time and unexpected inputs frequently occur. Retraining models from scratch whenever new data appears is computationally expensive, time-consuming, and impractical in resource-constrained environments. These limitations underscore the need for Continual Learning (CL), which enables models to incrementally learn from evolving data streams without forgetting past knowledge, and Out-of-Distribution (OOD) detection, which allows systems to identify and respond to novel or anomalous inputs. Jointly addressing both challenges is critical to developing robust, efficient, and adaptive AI systems.

CRAug 13, 2025
Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection

Andrea Ponte, Luca Demetrio, Luca Oneto et al.

Malware detection increasingly relies on AI systems that integrate signature-based detection with machine learning. However, these components are typically developed and combined in isolation, missing opportunities to reduce data complexity and strengthen defenses against adversarial EXEmples, carefully crafted programs designed to evade detection. Hence, in this work we investigate the influence that signature-based detection exerts on model training, when they are included inside the training pipeline. Specifically, we compare models trained on a comprehensive dataset with an AI system whose machine learning component is trained solely on samples not already flagged by signatures. Our results demonstrate improved robustness to both adversarial EXEmples and temporal data drift, although this comes at the cost of a fixed lower bound on false positives, driven by suboptimal rule selection. We conclude by discussing these limitations and outlining how future research could extend AI-based malware detection to include dynamic analysis, thereby further enhancing system resilience.

CRJul 4, 2025
Evaluating the Evaluators: Trust in Adversarial Robustness Tests

Antonio Emanuele Cinà, Maura Pintor, Luca Demetrio et al.

Despite significant progress in designing powerful adversarial evasion attacks for robustness verification, the evaluation of these methods often remains inconsistent and unreliable. Many assessments rely on mismatched models, unverified implementations, and uneven computational budgets, which can lead to biased results and a false sense of security. Consequently, robustness claims built on such flawed testing protocols may be misleading and give a false sense of security. As a concrete step toward improving evaluation reliability, we present AttackBench, a benchmark framework developed to assess the effectiveness of gradient-based attacks under standardized and reproducible conditions. AttackBench serves as an evaluation tool that ranks existing attack implementations based on a novel optimality metric, which enables researchers and practitioners to identify the most reliable and effective attack for use in subsequent robustness evaluations. The framework enforces consistent testing conditions and enables continuous updates, making it a reliable foundation for robustness verification.

CRJun 11, 2025
Empirical Quantification of Spurious Correlations in Malware Detection

Bianca Perasso, Ludovico Lozza, Andrea Ponte et al.

End-to-end deep learning exhibits unmatched performance for detecting malware, but such an achievement is reached by exploiting spurious correlations -- features with high relevance at inference time, but known to be useless through domain knowledge. While previous work highlighted that deep networks mainly focus on metadata, none investigated the phenomenon further, without quantifying their impact on the decision. In this work, we deepen our understanding of how spurious correlation affects deep learning for malware detection by highlighting how much models rely on empty spaces left by the compiler, which diminishes the relevance of the compiled code. Through our seminal analysis on a small-scale balanced dataset, we introduce a ranking of two end-to-end models to better understand which is more suitable to be put in production.

LGMay 29, 2025
Buffer-free Class-Incremental Learning with Out-of-Distribution Detection

Srishti Gupta, Daniele Angioni, Maura Pintor et al.

Class-incremental learning (CIL) poses significant challenges in open-world scenarios, where models must not only learn new classes over time without forgetting previous ones but also handle inputs from unknown classes that a closed-set model would misclassify. Recent works address both issues by (i)~training multi-head models using the task-incremental learning framework, and (ii) predicting the task identity employing out-of-distribution (OOD) detectors. While effective, the latter mainly relies on joint training with a memory buffer of past data, raising concerns around privacy, scalability, and increased training time. In this paper, we present an in-depth analysis of post-hoc OOD detection methods and investigate their potential to eliminate the need for a memory buffer. We uncover that these methods, when applied appropriately at inference time, can serve as a strong substitute for buffer-based OOD detection. We show that this buffer-free approach achieves comparable or superior performance to buffer-based methods both in terms of class-incremental learning and the rejection of unknown samples. Experimental results on CIFAR-10, CIFAR-100 and Tiny ImageNet datasets support our findings, offering new insights into the design of efficient and privacy-preserving CIL systems for open-world settings.

LGJun 14, 2024
Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis

Srishti Gupta, Zhang Chen, Luca Demetrio et al.

Thanks to their extensive capacity, over-parameterized neural networks exhibit superior predictive capabilities and generalization. However, having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial example -- input samples crafted ad-hoc to induce a desired misclassification. Relevant literature has claimed contradictory remarks in support of and against the robustness of over-parameterized networks. These contradictory findings might be due to the failure of the attack employed to evaluate the networks' robustness. Previous research has demonstrated that depending on the considered model, the algorithm employed to generate adversarial examples may not function properly, leading to overestimating the model's robustness. In this work, we empirically study the robustness of over-parameterized networks against adversarial examples. However, unlike the previous works, we also evaluate the considered attack's reliability to support the results' veracity. Our results show that over-parameterized networks are robust against adversarial attacks as opposed to their under-parameterized counterparts.

CVJan 24, 2024
Serial fusion of multi-modal biometric systems

Gian Luca Marcialis, Paolo Mastinu, Fabio Roli

Serial, or sequential, fusion of multiple biometric matchers has been not thoroughly investigated so far. However, this approach exhibits some advantages with respect to the widely adopted parallel approaches. In this paper, we propose a novel theoretical framework for the assessment of performance of such systems, based on a previous work of the authors. Benefits in terms of performance are theoretically evaluated, as well as estimation errors in the model parameters computation. Model is analyzed from the viewpoint of its pros and cons, by mean of preliminary experiments performed on NIST Biometric Score Set 1.

CROct 20, 2021
Fingerprint recognition with embedded presentation attacks detection: are we ready?

Marco Micheletto, Gian Luca Marcialis, Giulia Orrù et al.

The diffusion of fingerprint verification systems for security applications makes it urgent to investigate the embedding of software-based presentation attack detection algorithms (PAD) into such systems. Companies and institutions need to know whether such integration would make the system more "secure" and whether the technology available is ready, and, if so, at what operational working conditions. Despite significant improvements, especially by adopting deep learning approaches to fingerprint PAD, current research did not state much about their effectiveness when embedded in fingerprint verification systems. We believe that the lack of works is explained by the lack of instruments to investigate the problem, that is, modeling the cause-effect relationships when two non-zero error-free systems work together. Accordingly, this paper explores the fusion of PAD into verification systems by proposing a novel investigation instrument: a performance simulator based on the probabilistic modeling of the relationships among the Receiver Operating Characteristics (ROC) of the two individual systems when PAD and verification stages are implemented sequentially. As a matter of fact, this is the most straightforward, flexible, and widespread approach. We carry out simulations on the PAD algorithms' ROCs submitted to the most recent editions of LivDet (2017-2019), the state-of-the-art NIST Bozorth3, and the top-level Veryfinger 12 matchers. Reported experiments explore significant scenarios to get the conditions under which fingerprint matching with embedded PAD can improve, rather than degrade, the overall personal verification performance.

CVOct 12, 2021
Can machines learn to see without visual databases?

Alessandro Betti, Marco Gori, Stefano Melacci et al.

This paper sustains the position that the time has come for thinking of learning machines that conquer visual skills in a truly human-like context, where a few human-like object supervisions are given by vocal interactions and pointing aids only. This likely requires new foundations on computational processes of vision with the final purpose of involving machines in tasks of visual description by living in their own visual environment under simple man-machine linguistic interactions. The challenge consists of developing machines that learn to see without needing to handle visual databases. This might open the doors to a truly orthogonal competitive track concerning deep learning technologies for vision which does not rely on the accumulation of huge visual databases.

LGAug 26, 2021
Why Adversarial Reprogramming Works, When It Fails, and How to Tell the Difference

Yang Zheng, Xiaoyi Feng, Zhaoqiang Xia et al.

Adversarial reprogramming allows repurposing a machine-learning model to perform a different task. For example, a model trained to recognize animals can be reprogrammed to recognize digits by embedding an adversarial program in the digit images provided as input. Recent work has shown that adversarial reprogramming may not only be used to abuse machine-learning models provided as a service, but also beneficially, to improve transfer learning when training data is scarce. However, the factors affecting its success are still largely unexplained. In this work, we develop a first-order linear model of adversarial reprogramming to show that its success inherently depends on the size of the average input gradient, which grows when input gradients are more aligned, and when inputs have higher dimensionality. The results of our experimental analysis, involving fourteen distinct reprogramming tasks, show that the above factors are correlated with the success and the failure of adversarial reprogramming.

LGJun 14, 2021
Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

Antonio Emanuele Cinà, Kathrin Grosse, Sebastiano Vascon et al.

Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented a specific trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against different models, the factors affecting their effectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the effectiveness of backdoor attacks depends on: (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors affect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy on clean test samples is still high while backdoor attacks are ineffective, thereby suggesting novel criteria to improve existing defenses.

LGMay 2, 2021
BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability

Xinglong Chang, Katharina Dost, Kaiqi Zhao et al.

Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity between Applicability Domain and adversarial detection. Instead of focusing on unknown attacks, we focus on what is known, the training data. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally, and confirms that they are coherent with the model's output. This framework can be applied to any classification model and is not limited to specific attacks. We demonstrate these three stages work as one unit, effectively detecting various attacks, even for a white-box scenario.

LGMar 23, 2021
The Hammer and the Nut: Is Bilevel Optimization Really Needed to Poison Linear Classifiers?

Antonio Emanuele Cinà, Sebastiano Vascon, Ambra Demontis et al.

One of the most concerning threats for modern AI systems is data poisoning, where the attacker injects maliciously crafted training data to corrupt the system's behavior at test time. Availability poisoning is a particularly worrisome subset of poisoning attacks where the attacker aims to cause a Denial-of-Service (DoS) attack. However, the state-of-the-art algorithms are computationally expensive because they try to solve a complex bi-level optimization problem (the "hammer"). We observed that in particular conditions, namely, where the target model is linear (the "nut"), the usage of computationally costly procedures can be avoided. We propose a counter-intuitive but efficient heuristic that allows contaminating the training set such that the target system's performance is highly compromised. We further suggest a re-parameterization trick to decrease the number of variables to be optimized. Finally, we demonstrate that, under the considered settings, our framework achieves comparable, or even better, performances in terms of the attacker's objective while being significantly more computationally efficient.

LGFeb 25, 2021
Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints

Maura Pintor, Fabio Roli, Wieland Brendel et al.

Evaluating adversarial robustness amounts to finding the minimum perturbation needed to have an input sample misclassified. The inherent complexity of the underlying optimization requires current gradient-based attacks to be carefully tuned, initialized, and possibly executed for many computationally-demanding iterations, even if specialized to a given perturbation model. In this work, we overcome these limitations by proposing a fast minimum-norm (FMN) attack that works with different $\ell_p$-norm perturbation models ($p=0, 1, 2, \infty$), is robust to hyperparameter choices, does not require adversarial starting points, and converges within few lightweight steps. It works by iteratively finding the sample misclassified with maximum confidence within an $\ell_p$-norm constraint of size $ε$, while adapting $ε$ to minimize the distance of the current sample to the decision boundary. Extensive experiments show that FMN significantly outperforms existing attacks in terms of convergence speed and computation time, while reporting comparable or even smaller perturbation sizes.