SIMar 25, 2022
Machine-Learning Based Objective Function Selection for Community DetectionAsa Bornstein, Amir Rubin, Danny Hendler
NECTAR, a Node-centric ovErlapping Community deTection AlgoRithm, presented in 2016 by Cohen et. al, chooses dynamically between two objective functions which function to optimize, based on the network on which it is invoked. This approach, as shown by Cohen et al., outperforms six state-of-the-art algorithms for overlapping community detection. In this work, we present NECTAR-ML, an extension of the NECTAR algorithm that uses a machine-learning based model for automating the selection of the objective function, trained and evaluated on a dataset of 15,755 synthetic and 7 real-world networks. Our analysis shows that in approximately 90% of the cases our model was able to successfully select the correct objective function. We conducted a competitive analysis of NECTAR and NECTAR-ML. NECTAR-ML was shown to significantly outperform NECTAR's ability to select the best objective function. We also conducted a competitive analysis of NECTAR-ML and two additional state-of-the-art multi-objective community detection algorithms. NECTAR-ML outperformed both algorithms in terms of average detection quality. Multiobjective EAs (MOEAs) are considered to be the most popular approach to solve MOP and the fact that NECTAR-ML significantly outperforms them demonstrates the effectiveness of ML-based objective function selection.
CRAug 26, 2019
Modeling infection methods of computer malware in the presence of vaccinations using epidemiological models: An analysis of real-world dataElad Yom-Tov, Nir Levy, Amir Rubin
Computer malware and biological pathogens often use similar mechanisms of infections. For this reason, it has been suggested to model malware spread using epidemiological models developed to characterize the spread of biological pathogens. However, most work examining the similarities between malware and pathogens using such methods was based on theoretical analysis and simulation. Here we extend the classical Susceptible-Infected-Recovered (SIR) epidemiological model to describe two of the most common infection methods used by malware. We fit the proposed model to malware collected over a period of one year from a major anti-malware vendor. We show that by fitting the proposed model it is possible to identify the method of transmission used by the malware, its rate of infection, and the number of machines which will be infected unless blocked by anti-virus software. The Spearman correlation between the number of actual and predicted infected machines is 0.84. Examining cases where an anti-malware "signature" was transmitted to susceptible computers by the anti-virus provider, we show that the time to remove the malware will be short and independent of the number of infected computers if fewer than approximately 60% of susceptible computers have been infected. If more computers were infected, the time to removal will be approximately 3.2 greater, and will depend on the fraction of infected computers. Our results show that the application of epidemiological models of infection to malware can provide anti-virus providers with information on malware spread and its potential damage. We further propose that similarities between computer malware and biological pathogens, the availability of data on the former and the dearth of data on the latter, make malware an extremely useful model for testing interventions which could later be applied to improve medicine.
CRMay 23, 2019
AMSI-Based Detection of Malicious PowerShell Code Using Contextual EmbeddingsAmir Rubin, Shay Kels, Danny Hendler
PowerShell is a command-line shell, supporting a scripting language. It is widely used in organizations for configuration management and task automation but is also increasingly used by cybercriminals for launching cyberattacks against organizations, mainly because it is pre-installed on Windows machines and exposes strong functionality that may be leveraged by attackers. This makes the problem of detecting malicious PowerShell code both urgent and challenging. Microsoft's Antimalware Scan Interface (AMSI) allows defending systems to scan all the code passed to scripting engines such as PowerShell prior to its execution. In this work, we conduct the first study of malicious PowerShell code detection using the information made available by AMSI. We present several novel deep-learning based detectors of malicious PowerShell code that employ pretrained contextual embeddings of words from the PowerShell "language". A known problem in the cybersecurity domain is that labeled data is relatively scarce in comparison with unlabeled data, making it difficult to devise effective supervised detection of malicious activity of many types. This is also the case with PowerShell code. Our work shows that this problem can be mitigated by learning a pretrained contextual embedding based on unlabeled data. We trained and evaluated our models using real-world data, collected using AMSI from a large antimalware vendor. Our performance analysis establishes that the use of unlabeled data for the embedding significantly improved the performance of our detectors. Our best-performing model uses an architecture that enables the processing of textual signals from both the character and token levels and obtains a true positive rate of nearly 90% while maintaining a low false-positive rate of less than 0.1%.
CRApr 11, 2018
Detecting Malicious PowerShell Commands using Deep Neural NetworksDanny Hendler, Shay Kels, Amir Rubin
Microsoft's PowerShell is a command-line shell and scripting language that is installed by default on Windows machines. While PowerShell can be configured by administrators for restricting access and reducing vulnerabilities, these restrictions can be bypassed. Moreover, PowerShell commands can be easily generated dynamically, executed from memory, encoded and obfuscated, thus making the logging and forensic analysis of code executed by PowerShell challenging.For all these reasons, PowerShell is increasingly used by cybercriminals as part of their attacks' tool chain, mainly for downloading malicious contents and for lateral movement. Indeed, a recent comprehensive technical report by Symantec dedicated to PowerShell's abuse by cybercrimials reported on a sharp increase in the number of malicious PowerShell samples they received and in the number of penetration tools and frameworks that use PowerShell. This highlights the urgent need of developing effective methods for detecting malicious PowerShell commands.In this work, we address this challenge by implementing several novel detectors of malicious PowerShell commands and evaluating their performance. We implemented both "traditional" natural language processing (NLP) based detectors and detectors based on character-level convolutional neural networks (CNNs). Detectors' performance was evaluated using a large real-world dataset.Our evaluation results show that, although our detectors individually yield high performance, an ensemble detector that combines an NLP-based classifier with a CNN-based classifier provides the best performance, since the latter classifier is able to detect malicious commands that succeed in evading the former. Our analysis of these evasive commands reveals that some obfuscation patterns automatically detected by the CNN classifier are intrinsically difficult to detect using the NLP techniques we applied.