LGNov 25, 2018Code
Poisoning Behavioral Malware ClusteringBattista Biggio, Konrad Rieck, Davide Ariu et al.
Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems. However, the suitability of clustering algorithms for security-sensitive settings has been recently questioned by showing that they can be significantly compromised if an attacker can exercise some control over the input data. In this paper, we revisit this problem by focusing on behavioral malware clustering approaches, and investigate whether and to what extent an attacker may be able to subvert these approaches through a careful injection of samples with poisoning behavior. To this end, we present a case study on Malheur, an open-source tool for behavioral malware clustering. Our experiments not only demonstrate that this tool is vulnerable to poisoning attacks, but also that it can be significantly compromised even if the attacker can only inject a very small percentage of attacks into the input data. As a remedy, we discuss possible countermeasures and highlight the need for more secure clustering algorithms.
CRNov 15, 2016Code
AdversariaLib: An Open-source Library for the Security Evaluation of Machine Learning Algorithms Under AttackIgino Corona, Battista Biggio, Davide Maiorca
We present AdversariaLib, an open-source python library for the security evaluation of machine learning (ML) against carefully-targeted attacks. It supports the implementation of several attacks proposed thus far in the literature of adversarial learning, allows for the evaluation of a wide range of ML algorithms, runs on multiple platforms, and has multi-processing enabled. The library has a modular architecture that makes it easy to use and to extend by implementing novel attacks and countermeasures. It relies on other widely-used open-source ML libraries, including scikit-learn and FANN. Classification algorithms are implemented and optimized in C/C++, allowing for a fast evaluation of the simulated attacks. The package is distributed under the GNU General Public License v3, and it is available for download at http://sourceforge.net/projects/adversarialib.
LGJan 30, 2014Code
Security Evaluation of Support Vector Machines in Adversarial EnvironmentsBattista Biggio, Igino Corona, Blaine Nelson et al.
Support Vector Machines (SVMs) are among the most popular classification techniques adopted in security applications like malware detection, intrusion detection, and spam filtering. However, if SVMs are to be incorporated in real-world security systems, they must be able to cope with attack patterns that can either mislead the learning algorithm (poisoning), evade detection (evasion), or gain information about their internal parameters (privacy breaches). The main contributions of this chapter are twofold. First, we introduce a formal general framework for the empirical evaluation of the security of machine-learning systems. Second, according to our framework, we demonstrate the feasibility of evasion, poisoning and privacy attacks against SVMs in real-world security problems. For each attack technique, we evaluate its impact and discuss whether (and how) it can be countered through an adversary-aware design of SVMs. Our experiments are easily reproducible thanks to open-source code that we have made available, together with all the employed datasets, on a public repository.
CRAug 21, 2017
Evasion Attacks against Machine Learning at Test TimeBattista Biggio, Igino Corona, Davide Maiorca et al.
In security-sensitive applications, the success of machine learning depends on a thorough vetting of their resistance to adversarial data. In one pertinent, well-motivated attack scenario, an adversary may attempt to evade a deployed system at test time by carefully manipulating attack samples. In this work, we present a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks. Following a recently proposed framework for security evaluation, we simulate attack scenarios that exhibit different risk levels for the classifier by increasing the attacker's knowledge of the system and her ability to manipulate attack samples. This gives the classifier designer a better picture of the classifier performance under evasion attacks, and allows him to perform a more informed model selection (or parameter setting). We evaluate our approach on the relevant security task of malware detection in PDF files, and show that such systems can be easily evaded. We also sketch some countermeasures suggested by our analysis.
CRJul 2, 2017
DeltaPhish: Detecting Phishing Webpages in Compromised WebsitesIgino Corona, Battista Biggio, Matteo Contini et al.
The large-scale deployment of modern phishing attacks relies on the automatic exploitation of vulnerable websites in the wild, to maximize profit while hindering attack traceability, detection and blacklisting. To the best of our knowledge, this is the first work that specifically leverages this adversarial behavior for detection purposes. We show that phishing webpages can be accurately detected by highlighting HTML code and visual differences with respect to other (legitimate) pages hosted within a compromised website. Our system, named DeltaPhish, can be installed as part of a web application firewall, to detect the presence of anomalous content on a website after compromise, and eventually prevent access to it. DeltaPhish is also robust against adversarial attempts in which the HTML code of the phishing page is carefully manipulated to evade detection. We empirically evaluate it on more than 5,500 webpages collected in the wild from compromised websites, showing that it is capable of detecting more than 99% of phishing webpages, while only misclassifying less than 1% of legitimate pages. We further show that the detection rate remains higher than 70% even under very sophisticated attacks carefully designed to evade our system.
CRApr 28, 2017
Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware DetectionAmbra Demontis, Marco Melis, Battista Biggio et al.
To cope with the increasing variability and sophistication of modern attacks, machine learning has been widely adopted as a statistically-sound tool for malware detection. However, its security against well-crafted attacks has not only been recently questioned, but it has been shown that machine learning exhibits inherent vulnerabilities that can be exploited to evade detection at test time. In other words, machine learning itself can be the weakest link in a security system. In this paper, we rely upon a previously-proposed attack framework to categorize potential attack scenarios against learning-based malware detection tools, by modeling attackers with different skills and capabilities. We then define and implement a set of corresponding evasion attacks to thoroughly assess the security of Drebin, an Android malware detector. The main contribution of this work is the proposal of a simple and scalable secure-learning paradigm that mitigates the impact of evasion attacks, while only slightly worsening the detection rate in the absence of attack. We finally argue that our secure-learning approach can also be readily applied to other malware detection tasks.